Neoepitope detection of disease using protein arrays

ABSTRACT

A biosensor for use in detecting the presence of diseases, the biosensor comprising a detector for detecting a presence of at least one marker indicative of a specific disease. A method of determining efficacy of a pharmaceutical for treating a disease or staging disease by administering a pharmaceutical to a sample containing markers for a disease, detecting the amount of at least one marker of the disease in the sample, and analyzing the amount of the marker in the sample, whereby the amount of marker correlates to pharmaceutical efficacy or disease stage. Markers for gynecological disease. An immuno-imaging agent comprising labeled antibodies, whereby the labeled antibodies are isolated and reactive to proteins overexpressed in vivo. Informatics software for analyzing the arrays, the software including analyzing means for analyzing the arrays.

GRANT INFORMATION

Research in this application was supported in part by a grant from theNational Institute of Health (NIH Grant No. IR21CA100740-01) andMichigan Economic Development Grant MEDCO3-538. The Government hascertain rights in the invention.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates to an assay and method for diagnosingdisease. More specifically, the present invention relates to animmunoassay for use in diagnosing cancer.

2. Background Art

It is commonly known in the art that genetic mutations can be used fordetecting cancer. For example, the tumorigenic process leading tocolorectal carcinoma formation involves multiple genetic alterations(Fearon et al (1990) Cell 61, 759-767). Tumor suppressor genes such asp53, DCC and APC are frequently inactivated in colorectal carcinomas,typically by a combination of genetic deletion of one allele and pointmutation of the second allele (Baker et al (1989) Science 244, 217-221;Fearon et al (1990) Science 247, 49-56; Nishisho et al (1991) Science253, 665-669; and Groden et al (1991) Cell 66, 589-600). Mutation of twomismatch repair genes that regulate genetic stability was associatedwith a form of familial colon cancer (Fishel et al (1993) Cell 75,1027-1038; Leach et al (1993) Cell 75, 1215-1225; Papadopoulos et al(1994) Science 263, 1625-1629; and Bronner et al (1994) Nature 368,258-261). Proto-oncogenes such as myc and ras are altered in colorectalcarcinomas, with c-myc RNA being overexpressed in as many as 65% ofcarcinomas (Erisman et al (1985) Mol. Cell. Biol. 5, 1969-1976), and rasactivation by point mutation occurring in as many as 50% of carcinomas(Bos et al (1987) Nature 327, 293-297; and Forrester et al (1987) Nature327, 298-303). Other proto-oncogenes, such as myb and neu are activatedwith a much lower frequency (Alitalo et al (1984) Proc. Natl. Acad. Sci.USA 81, 4534-4538; and D'Emilia et al (1989) Oncogene 4, 1233-1239). Nocommon series of genetic alterations is found in all colorectal tumors,suggesting that a variety of such combinations can be able to generatethese tumors.

Increased tyrosine phosphorylation is a common element in signalingpathways that control cell proliferation. The deregulation of proteintyrosine kinases (PTKS) through overexpression or mutation has beenrecognized as an important step in cell transformation andtumorigenesis, and many oncogenes encode PTKs (Hunter (1989) inoncogenes and the Molecular Origins of Cancer, ed. Weinberg (Cold SpringHarbor Laboratory Press, Cold Spring Harbor, N.Y.), pp. 147-173).Numerous studies have addressed the involvement of PTKs in humantumorigenesis. Activated PTKs associated with colorectal carcinomainclude c-neu (amplification), trk (rearrangement), and c-src and c-yes(mechanism unknown) (D'Emilia et al (1989), ibid; Martin-Zanca et al(1986) Nature 3, 743-748; Bolen et al (1987) Proc. Natl. Acad. Sci. USA84, 2251-2255; Cartwright et al (1989) J. Clin. Invest. 83, 2025-2033;Cartwright et al (1990) Proc. Natl. Acad. Sci. USA 87, 558-562;Talamonti et al (1993) J. Clin. Invest. 91, 53-60; and Park et al (1993)Oncogene 8, 2627-2635).

Mutations, such as those disclosed above can be useful in detectingcancer. However, there have been few advancements which can repeatablybe used in diagnosing cancer prior to the existence of a tumor. Forexample, breast cancer, which is by far the most common form of cancerin women, is the second leading cause of cancer death in humans. Despitemany recent advances in diagnosing and treating breast cancer, theprevalence of this disease has been steadily rising at a rate of about1% per year since 1940. Today, the likelihood that a women living inNorth America can develop breast cancer during her lifetime is one ineight.

The current widespread use of mammography has resulted in improveddetection of breast cancer. Nonetheless, the death rate due to breastcancer has remained unchanged at about 27 deaths per 100,000 women. Alltoo often, breast cancer is discovered at a stage that is too faradvanced, when therapeutic options and survival rates are severelylimited. Accordingly, more sensitive and reliable methods are needed todetect small (less than 2 cm diameter), early stage, in situ carcinomasof the breast. Such methods should significantly improve breast cancersurvival, as suggested by the successful employment of Papinicolousmears for early detection and treatment of cervical cancer.

In addition to the problem of early detection, there remain seriousproblems in distinguishing between malignant and benign breast disease,in staging known breast cancers, and in differentiating betweendifferent types of breast cancers (eg. estrogen dependent versusnon-estrogen dependent tumors). Recent efforts to develop improvedmethods for breast cancer detection, staging and classification havefocused on a promising array of so-called cancer “markers.” Cancermarkers are typically proteins that are uniquely expressed (e.g. as acell surface or secreted protein) by cancerous cells, or are expressedat measurably increased or decreased levels by cancerous cells comparedto normal cells. Other cancer markers can include specific DNA or RNAsequences marking deleterious genetic changes or alterations in thepatterns or levels of gene expression associated with particular formsof cancer.

The utility of specific breast cancer markers for screening anddiagnosis, staging and classification, monitoring and/or therapypurposes depends on the nature and activity of the marker in question.For general reviews of breast cancer markers, see Porter-Jordan et al.,Hematol. Oncol. Clin. North Amer. 8: 73-100, 1994; and Greiner,Pharmaceutical Tech., May, 1993, pp. 28-44. As reflected in thesereviews, a primary focus for developing breast cancer markers hascentered on the overlapping areas of tumorigenesis, tumor growth andcancer invasion. Tumorigenesis and tumor growth can be assessed using avariety of cell proliferation markers (for example Ki67, cyclin D1 andproliferating cell nuclear antigen (PCNA)), some of which can beimportant oncogenes as well. Tumor growth can also be evaluated using avariety of growth factor and hormone markers (for example estrogen,epidermal growth factor (EGF), erbB-2, transforming growth factor(TGF)a), which can be overexpressed, underexpressed or exhibit alteredactivity in cancer cells. By the same token, receptors of autocrine orexocrine growth factors and hormones (for example insulin growth factor(IGF) receptors, and EGF receptor) can also exhibit changes inexpression or activity associated with tumor growth. Lastly, tumorgrowth is supported by angiogenesis involving the elaboration and growthof new blood vessels and the concomitant expression of angiogenicfactors that can serve as markers for tumorigenesis and tumor growth.

In addition to tumorigenic, proliferation and growth markers, a numberof markers have been identified that can serve as indicators ofinvasiveness and/or metastatic potential in a population of cancercells. These markers generally reflect altered interactions betweencancer cells and their surrounding microenvironment. For example, whencancer cells invade or metastasize, detectable changes can occur in theexpression or activity of cell adhesion or motility factors, examples ofwhich include the cancer markers Cathepsin D, plasminogen activators,collagenases and other factors. In addition, decreased expression oroverexpression of several putative tumor “suppressor” genes (for examplenm23, p53 and rb) has been directly associated with increased metastaticpotential or deregulation of growth predictive of poor disease outcome.

Additionally, ovarian cancer has the highest mortality rate of allgynecological cancers and yet there is still no reliable and easy toadminister screening test. Using the multimodality approach totreatment, including aggressive cytoreductive surgery in combinationwith chemotherapy, five-year survival rates diminish with increasingstage: Stage I (93%), Stage 11 (70%), Stage III (37%), and Stage 1V(25%). Despite advances in molecular biology, surgical oncology, andchemotherapy, the overall prognosis for ovarian cancer patientsdiagnosed at Stages II-IV remains poor. The excellent survival rates forStage I disease provide the rationale for efforts to detect early-stageovarian cancer as a screening test. The first priority of any screeningprocedure for ovarian cancer is high specificity in order to minimizethe number of false positive results and thereby ensuring an acceptablepositive predictive value (PPV). There have been no effective andreliable tests developed to date.

Screening for ovarian cancer has been based on strategies using serumtumor markers or ultrasound imaging of the ovaries. The most extensivelyinvestigated biomarker is CA-125, whose serum levels are elevated in 50%of Stage 1 and 90% of Stage II ovarian cancer patients. However,elevated CA-125 levels have also been observed in healthy women duringmenstruation, in patients with other gynecological diseases, and othermalignancies, which suggests that the false-positive rate of CA-125 canbe high.

In contrast to detection of serum antigens, the detection of serumantibody responses to tumor antigens may provide a more reliable serummarker for cancer diagnosis because serum antibodies are more stablethan serum antigens. Furthermore, antibodies may be more abundant thanantigens, especially at low tumor burdens characteristic of earlystages. Thirty percent of patients with ductal carcinoma in situ (DCIS)in which the protooncogene HER2/neu was overexpressed had serumantibodies specific to this protein. In addition, antibodies to p53 havebeen reported in patients with early-stage ovarian, and colorectalcancers. Antibodies against heat shock protein 90 (HSP90) were alsofound to be associated with patients' survival and tumor metastasis.Antibodies against ribosomal proteins may constitute a novel serologicalmarker. The presence of antibodies to ubiquitin C-terminal hydrolase L3in colon cancer has also been reported. Changes in the level of geneexpression in cancer and aberrant expression of tissue-restricted geneproducts in cancer are factors in the development of a humoral immuneresponse in cancer patients. In this respect, serological analysis ofrecombinant cDNA expression libraries (SEREX) of human tumors withautologous serum has identified some relevant tumor antigens. Among thegene products shown to be immunogenic are MAGE, SSX2, and NY-ESO-1,which are expressed in various tumor types, but not in normal tissuesexcept testis.

Studies on new technology based on proteomic patterns in serum to screenfor early stage ovarian cancer have been reported by Petricoin et al.(2002). The procedure involved generating proteomic spectra of serumproteins using Matrix-assisted laser desorption and ionizationtime-of-flight (MALDI-TOF) and surface-enhanced laser desorption andionization time-of-flight (SELDI-TOF) mass spectroscopy. In independentvalidation to detect early stage invasive epithelial ovarian cancer fromhealthy controls, the sensitivity of a multivariate model combining thethree biomarkers and CA125 [74% (95% CI, 52-90%)] was higher than thatof CA125 alone [65% (95% CI, 43-84%)] at a matched specificity of 97%(95% CI, 89-100%). When compared at a fixed sensitivity of 83% (95% CI,61-95%), the specificity of the model [94% (95% CI, 85-98%)] wassignificantly better than that of CA125 alone [52% (95% CI, 39-65%)].Due to the low prevalence of ovarian cancer in the general population,this level of specificity is unacceptable for a realistic ovarian cancerdiagnostic test. Assuming that in a clinical setting with low-riskpatients, ovarian cancer is present in approximately one per 2500patients, the (MALDI/SELDI) approach would produce 125 false positivesfor every true cancer patient. Furthermore, some issues have arisenregarding the mass spectroscopy technology of protein profiling. It hasbeen reported that the data obtained by this technology are difficult toreproduce and that they may be biased by artifacts in samplepreparation, storage and processing, and patient selection.

In summary, the evaluation of proliferation markers, oncogenes, growthfactors and growth factor receptors, angiogenic factors, proteases,adhesion factors and tumor suppressor genes, among other cancer markers,can provide important information concerning the risk, presence, statusor future behavior of cancer in a patient. Determining the presence orlevel of expression or activity of one or more of these cancer markerscan aid in the differential diagnosis of patients with uncertainclinical abnormalities, for example by distinguishing malignant frombenign abnormalities. Furthermore, in patients presenting withestablished malignancy, cancer markers can be useful to predict the riskof future relapse, or the likelihood of response in a particular patientto a selected therapeutic course. Even more specific information can beobtained by analyzing highly specific cancer markers, or combinations ofmarkers, which can predict responsiveness of a patient to specific drugsor treatment options.

Methods for detecting and measuring cancer markers have been recentlyrevolutionized by the development of immunological assays, particularlyby assays that utilize monoclonal antibody technology. Previously, manycancer markers could only be detected or measured using conventionalbiochemical assay methods, which generally require large test samplesand are therefore unsuitable in most clinical applications. In contrast,modern immunoassay techniques can detect and measure cancer markers inrelatively much smaller samples, particularly when monoclonal antibodiesthat specifically recognize a targeted marker protein are used.Accordingly, it is now routine to assay for the presence or absence,level, or activity of selected cancer markers by immunohistochemicallystaining tissue specimens obtained via conventional biopsy methods.Because of the highly sensitive nature of immunohistochemical staining,these methods have also been successfully employed to detect and measurecancer markers in smaller, needle biopsy specimens which require lessinvasive sample gathering procedures compared to conventional biopsyspecimens. In addition, other immunological methods have been developedand are now well known in the art that allow for detection andmeasurement of cancer markers in non-cellular samples such as serum andother biological fluids from patients. The use of these alternativesample sources substantially reduces the morbidity and costs of assayscompared to procedures employing conventional biopsy samples, whichallows for application of cancer marker assays in early screening andlow risk monitoring programs where invasive biopsy procedures are notindicated.

For the purpose of cancer evaluation, the use of conventional or needlebiopsy samples for cancer marker assays is often undesirable, because aprimary goal of such assays is to detect the cancer before it progressesto a palpable or detectable tumor stage. Prior to this stage, biopsiesare generally contraindicated, making early screening and low riskmonitoring procedures employing such samples untenable. Therefore, thereis general need in the art to obtain samples for cancer marker assays byless invasive means than biopsy, for example by serum withdrawal.

Efforts to utilize serum samples for cancer marker assays have met withlimited success, largely because the targeted markers are either notdetectable in serum, or because telltale changes in the levels oractivity of the markers cannot be monitored in serum. In addition, thepresence of cancer markers in serum probably occurs at the time ofmicro-metastasis, making serum assays less useful for detectingpre-metastatic disease.

Previous attempts to develop non-invasive breast cancer marker assaysutilizing mammary fluid samples have included studies of mammary fluidobtained from patients presenting with spontaneous nipple discharge. Inone of these studies, conducted by Inaji et al., Cancer 60: 3008-3013,1987, levels of the breast cancer marker carcinoembryonic antigen (CEA)were measured using conventional, enzyme linked immunoassay (ELISA) andsandwich-type, monoclonal immunoassay methods. These methodssuccessfully and reproducibly demonstrated that CEA levels inspontaneously discharged mammary fluid provide a sensitive indicator ofnonpalpable breast cancer. In a subsequent study, also by Inaji et al.,Jpn. J. Clin. Oncol. 19: 373-379, 1989, these results were expandedusing a more sensitive, dry chemistry, dot-immunobinding assay for CEAdetermination. This latter study reported that elevated CEA levelsoccurred in 43% of patients tested with palpable breast tumors, and in73% of patients tested with nonpalpable breast tumors. CEA levels in thedischarged mammary fluid were highly correlated with intratumoral CEAlevels, indicating that the level of CEA expression by breast cancercells is closely reflected in the mammary fluid CEA content. Based onthese results, the authors concluded that immunoassays for CEA inspontaneously discharged mammary fluid are useful for screeningnonpalpable breast cancer.

Although the evaluation of mammary fluid has been shown to be a usefulmethod for screening nonpalpable breast cancer in women who experiencespontaneous nipple discharge, the rarity of this condition renders themethods of Inaji et al, inapplicable to the majority of women who arecandidates for early breast cancer screening. In addition, the firstInaji report cited above determined that certain patients sufferingspontaneous nipple discharge secrete less than 10 μl of mammary fluid,which is a critically low level for the ELISA and sandwich immunoassaysemployed in that study. It is likely that other antibodies used to assayother cancer markers can exhibit even lower sensitivity than theanti-CEA antibodies used by Inaji and coworkers, and can therefore notbe adaptable or sensitive enough to be employed even in dry chemicalimmunoassays of small samples of spontaneously discharged mammary fluid.

In view of the above, an important need exists in the art for morewidely applicable, non-invasive methods and materials to obtainbiological samples for use in evaluating, diagnosing and managing breastand other diseases including cancer, particularly for screening earlystage, nonpalpable tumors. A related need exists for methods andmaterials that utilize such readily obtained biological samples toevaluate, diagnose and manage disease, particularly by detecting ormeasuring selected cancer markers, or panels of cancer markers, toprovide highly specific, cancer prognostic and/or treatment-relatedinformation, and to diagnose and manage pre-cancerous conditions, cancersusceptibility, bacterial and other infections, and other diseases.

With specific regard to such assays, specific antibodies can only bemeasured by detecting binding to their antigen or a mimic thereof.Although certain classes of immunoglobulins containing the antibodies ofinterest can, in some cases, be separated from the sample prior to theassay (Decker, et al., EP 0,168,689 A2), in all assays, at least someportion of the sample immunoglobulins are contacted with antigen. Forexample, in assays for specific IgM, a portion of the total IgM can beadsorbed to a surface and the sample removed prior to detection of thespecific IgM by contacting with antigen. Binding is then measured bydetection of the bound antibody, detection of the bound antigen ordetection of the free antigen.

For detection of bound antibody, a labeled anti-human immunoglobulin orlabeled antigen is normally allowed to bind antibodies that have beenspecifically adsorbed from the sample onto a surface coated with theantigen, Bolz, et al., U.S. Pat. No. 4,020,151. Excess reagent is washedaway and the label that remains bound to the surface is detected. Thisis the procedure in the most frequently used assays, or example, forhepatitis and human immunodeficiency virus and for numerousimmunohistochemical tests, Nakamura, et al., Arch Pathol Lab Med112:869-877 (1988). Although this method is relatively sensitive, it issubject to interference from non-specific binding to the surface bynon-specific immunoglobulins that can not be differentiated from thespecific immunoglobulins.

Another method of detecting bound antibodies involves combining thesample and a competing labeled antibody, with a support-bound antigen,Schuurs, et al., U.S. Pat. No. 3,654,090. This method has itslimitations because antibodies in sera bind numerous epitopes, makingcompetition inefficient.

For detection of bound antigen, the antigen can be used in excess of themaximum amount of antibody that is present in the sample or in an amountthat is less than the amount of antibody. For example,radioimmunoprecipitation (“RIP”) assays for GAD autoantibodies have beendeveloped and are currently in use, Atkinson, et al., Lancet335:1357-1360 (1990). However, attempts to convert this assay to anenzyme linked immunosorbent assay (“ELISA”) format have not beensuccessful. The RIP assay is based on precipitation of immunoglobulinsin human sera, and led to the development of a radioimmunoassay (“RIA”)for GAD autoantibodies. In both the RIP and the RIA, the antigen isadded in excess and the bound antigen:antibody complex is precipitatedwith protein A-Sepharose. The complex is then washed or furtherseparated by electrophoresis and the antigen in the complex is detected.

Other precipitating agents can be used such as rheumatoid factor or C1q,Masson, et al., U.S. Pat. No. 4,062,935; polyethylene glycol, Soeldner,et al., U.S. Pat. No. 4,855,242; and protein A, Ito, et al., EP0,410,893 A2. The precipitated antigen can be measured to indicate theamount of antibody in the sample; the amount of antigen remaining insolution can be measured; or both the precipitated antigen and thesoluble antigen can be measured to correct for any labeled antigen thatis non-specifically precipitated. These methods, while quite sensitive,are all difficult to carry out because of the need for rigorousseparation of the free antigen from the bound complex, which requires ata minimum filtration or centrifugation and multiple washing of theprecipitate.

Alternatively, detection of the bound antigen can be employed when theamount of antigen is less than the maximum amount of antibody. Normally,that is carried out using particles such as latex particles orerythrocytes that are coated with the antigen, Cambiaso, et al., U.S.Pat. No. 4,184,849 and Uchida, et al., EP 0,070,527 A1. Antibodies canspecifically agglutinate these particles and can then be detected bylight scattering or other methods. It is necessary in these assays touse a precise amount of antigen as too little antigen provides an assayresponse that is biphasic and high antibody titers can be read asnegative, while too much antigen adversely affects the sensitivity. Itis therefore necessary to carry out sequential dilutions of the sampleto assure that positive samples are not missed. Further, these assaystend to detect only antibodies with relatively high affinities and thesensitivity of the method is compromised by the tendency for all of thebinding sites of each antibody to bind to the antigen on the particle towhich it first binds, leaving no sites for binding to the otherparticle.

For assays in which the free antigen is detected, the antigen can alsobe added in excess or in a limited amount although only the former hasbeen reported. Assays of this type have been described where an excessof antigen is added to the sample, the immunoglobulins are precipitated,and the antigen remaining in the solution is measured, Masson, et al.,supra and Soeldner, et al., supra. These assays are relativelyinsensitive because only a small percentage change in the amount of freeantigen occurs with low amounts of antibody, and this small percentageis difficult to measure accurately.

Practical assays in which the free antigen is detected and the antigenis not present in excess of the maximum amount of antibody expected in asample have not been described. However, in van Erp, et al., Journal ofImmunoassay 12(3):425-443 (1991), a fixed concentration of monoclonalantibody was incubated with a concentration dilution series of antigen,and free antigen was then measured using a gold sol particleagglutination immunoassay to determine antibody affinity constants.

There has been much research in the area of evaluating useful markersfor determining the risk factor for patients developing IDDM. Theseinclude insulin autoantibodies, Soeldner, et al., supra and circulatingautoantibodies to glutamic acid decarboxylase (“GAD”), Atkinson, et al.,PCT/US89/05570 and Tobin, et al., PCT/US91/06872. In addition, Rabin, etal., U.S. Pat. No. 5,200,318 describes numerous assay formats for thedetection of GAD and pancreatic islet cell antigen autoantibodies. GADautoantibodies are of particular diagnostic importance because theyoccur in preclinical stages of the disease, which can make therapeuticintervention possible. However, the use of GAD autoantibodies as adiagnostic marker has been impeded by the lack of a convenient,nonisotopic assay.

One assay method involves incubating a support-bound antigen with thesample, then adding a labeled anti-human immunoglobulin. This is thebasis for numerous commercially available assay kits for antibodies suchas the Syn ELISA kit which assays for autoantibodies to GAD65, and isdescribed in product literature entitled “Syn^(ELISA) GAD II-Antibodies”(Elias USA, Inc.). Substantial dilution of the sample is requiredbecause the method is subject to high background signals from adsorptionof non-specific human immunoglobulins to the support.

Many of the assays described above involve detection of antibody thatbecomes bound to an immobilized antigen. This can have an adverse affecton the sensitivity of the assay due to difficulty in distinguishingbetween specific immunoglobulins and other immunoglobulins in thesample, which bind non-specifically to the immobilized antigen. There isnot only a need to develop an assay that avoids non-specific detectionof immunoglobulins, but there is also the need for an improved method ofdetecting antibodies that combines the sensitivity advantage ofimmunoprecipitation assays with a simplified protocol. Finally, assaysthat can help evaluate the risk of developing diseases are medically andeconomically very important. The present invention addresses theseneeds.

SUMMARY OF THE INVENTION

According to the present invention, there is provided a biosensor foruse in detecting the presence of diseases, the biosensor comprising adetector for detecting a presence of at least one marker indicative of aspecific disease. Also provided is a method of determining efficacy of apharmaceutical for treating a disease or staging disease byadministering a pharmaceutical to a sample containing markers for adisease, detecting the amount of at least one marker of the disease inthe sample, and analyzing the amount of the marker in the sample,whereby the amount of marker correlates to pharmaceutical efficacy ordisease stage. Markers for gynecological disease selected from the listin Table 6 and further from the list in Table 8 are provided. Animmuno-imaging agent comprising labeled antibodies, whereby the labeledantibodies are isolated and reactive to proteins overexpressed in vivoare provided. Informatics software for analyzing the arrays discussedabove is provided, wherein the software includes analyzing means foranalyzing the arrays.

DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

Other advantages of the present invention are readily appreciated as thesame becomes better understood by reference to the following detaileddescription when considered in connection with the accompanying drawingswherein:

FIGS. 1A-D are photographs showing the identification of a phagedisplaying peptide sequence of Sirt2 by plaque lift;

FIG. 2 is a photograph showing the analysis of the PCR product of theplaques by Southern Blot hybridization;

FIG. 3 is a photograph showing the Dot Blot analysis of Sirt2 positiveplaques;

FIG. 4 is a photograph showing green and red labeled detection of serumantibodies indicative of the antibody reaction to the protein;

FIGS. 5 A-E are photographs showing the ECL detection of phagotopesselected with a breast cancer patient's serum;

FIGS. 6 A-C are as follows: FIG. 6A is a photograph showing thecomparison of serum reaction of control and breast cancer patient withphagotopes from BP4; and FIG. 6B is a graph of the BP4 filters whichwere scanned thereby showing the ratio of the pixel densities plotted inrank order; FIG. 6C is a scan of a microarray demonstrating the bindinga Cy5-labeled antihuman IgG to human IgG from patient #1's serum and thecontrol Cy3-labeled antibody to phage T7 capsid protein to phage clonesmicroarrayed on glass;

FIGS. 7A-7B show the method of finding informative epitopes: FIG. 7Ashows the cancer template; FIG. 7B, shows the spot intensities plottedon the vertical axis for 12 subjects (controls to the left and patientsto the right) the template defined on the left (shown in blue) was usedwith a correlation distance, a correlation threshold of 0.8 selected the46 epitopes shown here in red (out of the total of 4×96=384 shown herein yellow);

FIGS. 8A-8B show an example comparison between the histogram of acontrol subject (19218) with a high but non-specific reaction (FIG. 8A),and the histogram of a patient (19223) (FIG. 8B); the histograms arecalculated on the ratios of the background corrected mean intensity ofthe human IgG labeled with Cy5 vs. the background corrected meanintensity of the T7 labeled with Cy3;

FIGS. 9A-9B show a comparison between the scatterplot of a controlsubject (19218) (FIG. 9A) with a strong but non-specific reaction andthe scatterplot of a patient MEC1 (19223) (FIG. 9B), the scattergramsplot the background corrected mean intensity of the human IgG labeledwith Cy5 vs. the background corrected mean intensity of the T7 labeledwith Cy3;

FIG. 10 shows the matrix of reactivity between sets of clones comingfrom patients 1-12 (in rows) and sera from same patients (in columns),at this point (step 2 of Procedure 2), the matrix contains the resultsof the self-reactions: patients 1-10 have a specific self-reactionwhereas patients 11 and 12 do not, Patients 11 and 12 are eliminatedfrom the clone selection procedure;

FIG. 11 shows a matrix of reactivity between sources of clones anddifferent sera ordered by reactivity; the clones from patient 2 reactwith sera from self (column 2) and patients 4 and 8; the clones frompatient 3 react with sera from self (column 3) and patients 6 and 10,etc, note that the union of the set of clones coming from patients 2, 3,5, 7 and 1 ensures that the chip made with these clones reacts with allpatients;

FIGS. 12A-G are filter microarrays showing antigen binding with IgGs inthe serum of Stage I ovarian cancer patients; and

FIGS. 13A-D are graphs showing the determination of a titerableantigen-antibody binding in ELISA macroarray analysis.

DETAILED DESCRIPTION OF THE INVENTION

Generally, the present invention provides a method and markers for usein detecting disease and stages of disease. In other words, the markerscan be used to determine the presence of disease without requiring thepresence of symptoms.

The method and markers of the present invention can be used to diagnosethe presence of a disease or a disease stage in a patient. The method ofthe present invention utilizes a detector device for detecting thepresence of at least one marker in the serum of the patient. The benefitof such an analytical device is that the marker that is detected is oneof a panel of markers. The panel of markers can include markers that areknown to those of skill in the art and markers determined utilizing themethodology disclosed herein. The markers of the present invention canbe used to detect diseases. Examples of diseases include, but are notlimited to, gynecological sickness, such as endometriosis, ovariancancer, breast cancer, cervical cancer, and primary peritonealcarcinoma. The method can also be used to identify overexpressed ormutated proteins in tumor cells. That such proteins are mutated oroverexpressed presumably is the basis for the immune reaction to theseproteins. Therefore markers identified using these methods could providemarkers for molecular pathology as diagnostic or prognostic markers.

The method can also be used for immunotherapy targeted to a person'simmunoprofile based on the arrays. For personalized immunotherapy, thereactivity to particular epitope clones can be correlated using serafrom patients having cancer. Using a comprehensive panel of epitopemarkers that can accurately detect early stage ovarian cancer one canutilize these antigen as immuno-therapeutic agents personalized to theimmuno-profile of each patient. When T-cells from the patient recognizeantigen biomarkers, they get stimulated, activated and therefore producean immune-response. Such reactivity demonstrates the potential of eachantigen as a component of a vaccine to induce a T cell-mediated immuneresponse essential for generation of cancer vaccines. Individualsscoring positive in the presymptomatic testing for OVCA can then beoffered an anti-tumor vaccine tailored to their immunoprofile against apanel of tumor antigens.

The detector includes, but is not limited to an assay, a slide, afilter, a microarray, macroarray, computer software implementing thedata analysis methods, and any combinations thereof. The detector canalso include a two-color detection system or other detector system knownto those of skill in the art.

By “bodily fluid” as used herein it is meant any bodily fluid known tothose of skill in the art to contain antibodies therein. Examplesinclude, but are not limited to, blood, saliva, tears, spinal fluid,serum, and other fluids known to those of skill in the art to containantibodies.

By “biopanning”, it is meant a selection process for use in screening alibrary (Parmley and Smith, Gene, 73:308 (1988); Noren, C. J., NEBTranscript, 8(1);1 (1996)). Biopanning is carried out by incubatingphages encoding the peptides with a plate coated with the proteins,washing away the unbound phage, eluting, and amplifying the specificallybound phage. Those skilled in the art readily recognize otherimmobilization schemes that can provide equivalent technology, such asbut not limited to binding the proteins or other targets to beads.

By staging the disease, as for example in cancer, it is intended toinclude determining the extent of a cancer, especially whether thedisease has spread from the original site to other parts of the body.The stages can range from 0 to 5 with 0 being the presence of cancerouscells and 5 being the spread of the cancer cells to other parts of thebody including the lymph nodes. Further, the staging can indicate thestage of a borderline histology. A borderline histology is a lessmalignant form of disease. Additionally, staging can indicate a relapseof disease, in other words the reoccurrence of disease.

The term “marker” as used herein is intended to include, but is notlimited to, a gene or a piece of a gene which codes for a protein, aprotein such as a fusion protein, open reading frames such as ESTs,epitopes, mimotopes, antigens, and any other indicator of immuneresponse. The marker can also be used as a predictor of disease or therecurrence of disease.

The present invention further includes a random peptide epitope(mimotope) that mimics a natural antigenic epitope during epitopepresentation. Such mimotopes are useful in the applications and methodsdiscussed above. Also included in the present invention is a method ofidentifying a random peptide epitope. In the method, a library of randompeptide epitopes is generated or selected. The library is contacted withan anti-antibody. Mimotopes are identified that are specificallyimmunoreactive with the antibody. Sera (containing anti antibodies) orantibodies generated by the methods of the present invention can beused. Random peptide libraries can, for example, be displayed on phage(phagotopes) or generated as combinatorial libraries.

“Antibody” refers to a polypeptide comprising a framework region from animmunoglobulin gene or fragments thereof that specifically binds andrecognizes an antigen. The recognized immunoglobulin genes include thekappa, lambda, alpha, gamma, delta, epsilon, and mu constant regiongenes, as well as the various immunoglobulin diversity/joining/variableregion genes. Light chains are classified as either kappa or lambda.Heavy chains are classified as gamma, mu, alpha, delta, or epsilon,which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD andIgE, respectively.

An exemplary immunoglobulin (antibody) structural unit comprises atetramer. Each tetramer is composed of two identical pairs ofpolypeptide chains, each pair having one “light” (about 25 kDa) and one“heavy” chain (about 50-70 kDa). The N-terminus of each chain defines avariable region of about 100 to 110 or more amino acids primarilyresponsible for antigen recognition. The terms variable light chain(V_(L)) and variable heavy chain (V_(H)) refer to these light and heavychains respectively.

Antibodies exist, e.g., as intact immunoglobulins or as a number ofwell-characterized fragments produced by digestion with variouspeptidases. Thus, for example, pepsin digests an antibody below thedisulfide linkages in the hinge region to produce F(ab)′₂, a dimer ofFab which itself is a light chain joined to V_(H)-C_(H) 1 by a disulfidebond. The F(ab)′₂ can be reduced under mild conditions to break thedisulfide linkage in the hinge region, thereby converting the F(ab)′₂dimer into an Fab′ monomer. The Fab′ monomer is essentially Fab withpart of the hinge region (see Fundamental Immunology (Paul ed., 3d ed.1993). While various antibody fragments are defined in terms of thedigestion of an intact antibody, one of skill can appreciate that suchfragments can be synthesized de novo either chemically or by usingrecombinant DNA methodology. Thus, the term antibody, as used herein,also includes antibody fragments either produced by the modification ofwhole antibodies, or those synthesized de novo using recombinant DNAmethodologies (e.g., single chain Fv) or those identified using phagedisplay libraries (see, e.g., McCafferty et al., Nature 348:552-554(1990)).

For preparation of monoclonal or polyclonal antibodies, any techniqueknown in the art can be used (see, e.g., Kohler & Milstein, Nature256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Coleet al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)).Techniques for the production of single chain antibodies (U.S. Pat. No.4,946,778) can be adapted to produce antibodies to polypeptides of thisinvention. Also, transgenic mice, or other organisms such as othermammals, can be used to express humanized antibodies. Alternatively,phage display technology can be used to identify antibodies andheteromeric Fab fragments that specifically bind to selected antigens(see, e.g., McCafferty et al., Nature 348:552-554 (1990); Marks et al.,Biotechnology 10:779-783 (1992)).

A “chimeric antibody” is an antibody molecule in which (a) the constantregion, or a portion thereof, is altered, replaced or exchanged so thatthe antigen binding site (variable region) is linked to a constantregion of a different or altered class, effector function and/orspecies, or an entirely different molecule which confers new propertiesto the chimeric antibody, e.g., an enzyme, toxin, hormone, growthfactor, drug, etc.; or (b) the variable region, or a portion thereof, isaltered, replaced or exchanged with a variable region having a differentor altered antigen specificity.

The term “immunoassay” is an assay wherein an antibody specificallybinds to an antigen. The immunoassay is characterized by the use ofspecific binding properties of a particular antibody to isolate, target,and/or quantify the antigen. In addition, an antigen can be used tocapture or specifically bind an antibody.

The phrase “specifically (or selectively) binds” to an antibody or“specifically (or selectively) immunoreactive with,” when referring to aprotein or peptide, refers to a binding reaction that is determinativeof the presence of the protein in a heterogeneous population of proteinsand other biologics. Thus, under designated immunoassay conditions, thespecified antibodies bind to a particular protein at least two times thebackground and do not substantially bind in a significant amount toother proteins present in the sample. Specific binding to an antibodyunder such conditions can require an antibody that is selected for itsspecificity for a particular protein. For example, polyclonal antibodiesraised to modified β-tubulin from specific species such as rat, mouse,or human can be selected to obtain only those polyclonal antibodies thatare specifically immunoreactive, e.g., with β-tubulin modified atcysteine 239 and not with other proteins. This selection can be achievedby subtracting out antibodies that cross-react with other molecules.Monoclonal antibodies raised against modified β-tubulin can also beused. A variety of immunoassay formats can be used to select antibodiesspecifically immunoreactive with a particular protein. For example,solid-phase ELISA immunoassays are routinely used to select antibodiesspecifically immunoreactive with a protein (see, e.g., Harlow & Lane,Antibodies, A Laboratory Manual (1988), for a description of immunoassayformats and conditions that can be used to determine specificimmunoreactivity). Typically a specific or selective reaction can be atleast twice background signal or noise and more typically more than 10to 100 times background.

A “label” or a “detectable moiety” is a composition detectable byspectroscopic, photochemical, biochemical, immunochemical, or chemicalmeans. For example, useful labels include ³²P, fluorescent dyes, iodine,electron-dense reagents, enzymes (e.g., as commonly used in an ELISA),biotin, digoxigenin, or haptens and proteins for which antisera ormonoclonal antibodies are available, e.g., by incorporating a radiolabelinto the peptide, or any other label known to those of skill in the art.

A “labeled antibody or probe” is one that is bound, either covalently,through a linker or a chemical bond, or noncovalently, through ionic,van der Waals, electrostatic, or hydrogen bonds to a label such that thepresence of the antibody or probe can be detected by detecting thepresence of the label bound to the antibody or probe.

The terms “isolated” “purified” or “biologically pure” refer to materialthat is substantially or essentially free from components that normallyaccompany it as found in its native state. Purity and homogeneity aretypically determined using analytical chemistry techniques such aspolyacrylamide gel electrophoresis or high performance liquidchromatography. A protein that is the predominant species present in apreparation is substantially purified. The term “purified” denotes thata nucleic acid or protein gives rise to essentially one band in anelectrophoretic gel. Particularly, it means that the nucleic acid orprotein is at least 85% pure, optionally at least 95% pure, andoptionally at least 99% pure.

The term “recombinant” when used with reference, e.g., to a cell, ornucleic acid, protein, or vector, indicates that the cell, nucleic acid,protein or vector, has been modified by the introduction of aheterologous nucleic acid or protein or the alteration of a nativenucleic acid or protein, or that the cell is derived from a cell somodified. Thus, for example, recombinant cells express genes that arenot found within the native (non-recombinant) form of the cell orexpress native genes that are otherwise abnormally expressed, underexpressed or not expressed at all.

An “expression vector” is a nucleic acid construct, generatedrecombinantly or synthetically, with a series of specified nucleic acidelements that permit transcription of a particular nucleic acid in ahost cell. The expression vector can be part of a plasmid, virus, ornucleic acid fragment. Typically, the expression vector includes anucleic acid to be transcribed operably linked to a promoter.

By “support or surface” as used herein, the term is intended to include,but is not limited to a solid phase which is typically a support orsurface, which is a porous or non-porous water insoluble material thatcan have any one of a number of shapes, such as strip, rod, particle,including beads and the like. Suitable materials are well known in theart and are described in, for example, Ullman, et al. U.S. Pat. No.5,185,243, columns 10-11, Kurn, et al., U.S. Pat. No. 4,868,104, column6, lines 21-42 and Milburn, et al., U.S. Pat. No. 4,959,303, column 6,lines 14-31, which are incorporated herein by reference. Binding ofligands and receptors to the support or surface can be accomplished bywell-known techniques, readily available in the literature. See, forexample, “Immobilized Enzymes,” Ichiro Chibata, Halsted Press, New York(1978) and Cuatrecasas, J. Biol. Chem. 245:3059 (1970). Whatever type ofsolid support is used, it must be treated so as to have bound to itssurface either a receptor or ligand that directly or indirectly bindsthe antigen. Typical receptors include antibodies, intrinsic factor,specifically reactive chemical agents such as sulfhydryl groups that canreact with a group on the antigen, and the like. For example, avidin orstreptavidin can be covalently bound to spherical glass beads of 0.5-1.5mm and used to capture a biotinylated antigen.

Signal producing system (“sps”) includes one or more components, atleast one component being a label, which generate a detectable signalthat relates to the amount of bound and/or unbound label, i.e. theamount of label bound or not bound to the compound being detected. Thelabel is any molecule that produces or can be induced to produce asignal, such as a fluorescer, enzyme, chemiluminescer, orphotosensitizer. Thus, the signal is detected and/or measured bydetecting enzyme activity, luminescence, or light absorbance.

Suitable labels include, by way of illustration and not limitation,enzymes such as alkaline phosphatase, glucose-6-phosphate dehydrogenase(“G6PDH”) and horseradish peroxidase; ribozyme; a substrate for areplicase such as Q-beta replicase; promoters; dyes; fluorescers such asfluorescein, isothiocyanate, rhodamine compounds, phycoerythrin,phycocyanin, allophycocyanin, o-phthaldehyde, and fluorescamine;chemiluminescers such as isoluminol; sensitizers; coenzymes; enzymesubstrates; photosensitizers; particles such as latex or carbonparticles; suspendable particles; metal sol; crystallite; liposomes;cells, etc., which can be further labeled with a dye, catalyst, or otherdetectable group. Suitable enzymes and coenzymes are disclosed inLitman, et al., U.S. Pat. No. 4,275,149, columns 19-28, and Boguslaski,et al., U.S. Pat. No. 4,318,980, columns 10-14; suitable fluorescers andchemiluminescers are disclosed in Litman, et al., U.S. Pat. No.4,275,149, at columns 30 and 31; which are incorporated herein byreference. Preferably, at least one sps member is selected from thegroup consisting of fluorescers, enzymes, chemiluminescers,photosensitizers, and suspendable particles.

The label can directly produce a signal, and therefore, additionalcomponents are not required to produce a signal. Numerous organicmolecules, for example fluorescers, are able to absorb ultraviolet andvisible light, where the light absorption transfers energy to thesemolecules and elevates them to an excited energy state. This absorbedenergy is then dissipated by emission of light at a second wavelength.Other labels that directly produce a signal include radioactive isotopesand dyes.

Alternately, the label may need other components to produce a signal,and the sps can then include all the components required to produce ameasurable signal, which can include substrates, coenzymes, enhancers,additional enzymes, substances that react with enzymatic products,catalysts, activators, cofactors, inhibitors, scavengers, metal ions,specific binding substance required for binding of signal generatingsubstances, and the like. A detailed discussion of suitable signalproducing systems can be found in Ullman, et al. U.S. Pat. No.5,185,243, columns 11-13, which is incorporated herein by reference.

The label is bound to a specific binding pair (hereinafter “sbp”) memberwhich is the antigen, or is capable of directly or indirectly bindingthe antigen, or is a receptor for the antigen, and includes, withoutlimitation, the antigen; a ligand for a receptor bound to the antigen; areceptor for a ligand bound to the antigen; an antibody that binds theantigen; a receptor for an antibody that binds the antigen; a receptorfor a molecule conjugated to an antibody to the antigen; an antigensurrogate capable of binding a receptor for the antigen; a ligand thatbinds the antigen, etc. Binding of the label to the sbp member can beaccomplished by means of non-covalent bonding as for example byformation of a complex of the label with an antibody to the label or bymeans of covalent bonding as for example by chemical reactions whichresult in replacing a hydrogen atom of the label with a bond to the sbpmember or can include a linking group between the label and the sbpmember. Such methods of conjugation are well known in the art. See forexample, Rubenstein, et al., U.S. Pat. No. 3,817,837, which isincorporated herein by reference. Other sps members can also be boundcovalently to sbp members. For example, in Ullman, et al., U.S. Pat. No.3,996,345, two sps members such as a fluorescer and quencher can bebound respectively to two sbp members that both bind the analyte, thusforming a fluorescer-sbp₁:analyte:sbp₂-quencher complex. Formation ofthe complex brings the fluorescer and quencher in close proximity, thuspermitting the quencher to interact with the fluorescer to produce asignal. This is a fluorescent excitation transfer immunoassay. Anotherconcept is described in Ullman, et al., EP 0,515,194 A2, which uses achemiluminescent compound and a photosensitizer as the sps members. Thisis referred to as a luminescent Oxygen channeling immunoassay. Both theaforementioned references are incorporated herein by reference.

The analysis of mRNA expression in tumors does not necessarily revealthe status of protein levels in the cancer cells. Other factors such asprotein half-life and mutation can be altered without an effect on mRNAlevels thus masking significant molecular changes at the protein level.Serum antibody reactivity to cellular proteins occurs in cancer patientsdue to presentation of mutated forms of proteins from the tumor cells oroverexpression of proteins in the tumor cells. The host immune systemcan direct individuals to molecular events critical to the genesis ofthe disease. Using a candidate gene approach, experience has shown thatthe frequency of serum positivity to any single protein is low.Therefore, to increase the identification of such autoantigens, a moreglobal approach is employed to exploit immunoreactivity to identifylarge numbers of cDNAs coding for proteins that are mutated orupregulated in cancer cells.

In order to develop an effective screening test for early detection ofovarian cancer, cDNA phage display libraries are used to isolate cDNAscoding for epitopes reacting with antibodies present specifically in thesera of patients with ovarian cancer. The methods of the presentinvention detect various antibodies that are produced by patients inreaction to proteins overexpressed in their ovarian tumors. This isachievable by differential biopanning technology using human seracollected both from normal individuals and patients having ovariancancer and phage display libraries expressing cDNAs of genes expressedin ovarian epithelial tumors and cell lines. Serum reactivity toward acellular protein can occur because of the presentation to the immunesystem of a mutated form of the protein from the tumor cells oroverexpression of the protein in the tumor cells. The strategy providesfor the identification of epitope-bearing phage clones (phagotopes)displaying reactivity with antibodies present in sera of patients havingovarian cancer but not in control sera from unaffected women. Thisstrategy leads to the identification of novel disease-related epitopesfor diseases including, but not limited to ovarian cancer, that haveprognostic/diagnostic value with additional potential for therapeuticvaccines and medical imaging reagents. This also creates a database thatcan be used to determine both the presence of disease and the stage ofthe disease.

The series of experiments disclosed herein provide direct evidence thatbiopanning a T7 coat protein fusion library can isolate epitopes forantibodies present in polyclonal sera. This also showed that thetechnology can be applied to direct microarray screening of largenumbers of selected phage against numerous patient and control sera.This approach provides a large number of biomarkers for early detectionof disease.

More specifically, the methods of the present invention provide four tofive cycles of affinity selection and biopanning which are carried outwith biological amplification of the phage after each biopanning,meaning growth of the biological vector of the cDNA expression clone ina biological host. Examples of biological amplification include but arenot limited to growth of a lytic or lysogenic bacteriophage in hostbacteria or transformation of bacterial host with selected DNA of thecDNA expression vector. The number of biopanning cycles generallydetermines the extent of the enrichment for phage that binds to the seraof patient with ovarian cancer. This strategy allows for one cycle ofbiopanning to be performed in a single day. Someone skilled in the artcan establish different schedules of biopanning that provide the sameessential features of the procedure described above.

Two biopanning experiments are performed with each librarydifferentially selecting clones between control and disease patientsera. The first selection is to isolate phagotope clones that do notbind to control sera pooled from control women but do bind to a pool ofdisease patient serum. This set of phagotope clones represent epitopesthat are indicative of the presence of disease as recognized by the hostimmune system. The second type of screening is performed to isolatephagotope clones that did not bind to a pool of control sera but do bindto an individual patient's serum. Those sets of phagotope clonesrepresent epitopes that are indicative of the presence of disease.

Subsequent to the biopanning, the clones so isolated can be used tocontact antibodies in sera by spotting the clones or peptide sequencesof amino acids containing those encoded by the clones. After spotting ona solid support, the arrays are rinsed briefly in a 1% BSA/PBS to removeunbound phage, then transferred immediately to a 1% BSA/PBS blockingsolution and allowed to sit for 1 hour at room temperature. The excessBSA is rinsed off from the slides using PBS. This step insures that theelution step of antibodies is more effective. The use of PBS elutes allof the antibodies without harming the binding of the antibody. Antibodydetection of reaction with the clones or peptides on the array iscarried out by labeling of the serum antibodies or through the use of alabeled secondary antibody that reacts with the patient's antibodies. Asecond control reaction to every spot allows for greater accuracy of thequantitation of reactivity and increases sensitivity of detection.

The slides are subsequently processed to quantify the reaction of eachphagotopes. Such processing is specific to the label used. For instance,if fluorophore cy3-cy5 labels are used, this processing is done in alaser scanner that captures an image of the slide for each fluorophoreused. Subsequent image processing familiar to those skilled in the artcan provide intensity values for each phagotope.

The data analysis can be divided into the following steps:

1. Pre-processing and normalization.

2. Identifying the most informative markers

3. Building a predictor for molecular diagnosis of ovarian cancer andvalidating the results.

The purpose of the first step is to cleanse the data from artifacts andprepare it for the subsequent steps. Such artifacts are usuallyintroduced in the laboratory and include: slide contamination,differential dye incorporation, scanning and image processing problems(e.g. different average intensities from one slide to another),imperfect spots due to imperfect arraying, washing, drying, etc. Thepurpose of the second step is to select the most informative phages thatcan be used for diagnostic purposes. The purpose of the third step is todevelop a software classifier able to diagnose cancer based on theantibody reactivity values of the selected phages. The last step alsoincludes the validation of this classifier and the assessment of itsperformance using various measures such as specificity, sensitivity,positive predictive value and negative predictive value. The computationof such measures can be done on cases not used during the design of thechip in order to assess the real-world performance of the diagnosis toolobtained.

The pre-processing and normalization step is used for arrays using twochannels such as Cy5 for the human IgG and Cy3 for the T7 control, thespots are segmented and the mean intensity is calculated for each spot.A mean intensity value is calculated for the background, as well. Abackground corrected value is calculated by subtracting the backgroundfrom the signal. If necessary, non-linear dye effects can be eliminatedby performing an exponential normalization (Houts, 2000) and/or LOESSnormalization of the data and/or a piecewise linear normalization (seeFIGS. 7 A-D). The values coming from each channel are subsequentlydivided by their mean of the intensities over the whole array.Subsequently, the ratio between the IgG and the T7 channels wascalculated. The values coming from replicate spots (spots printed inquadruplicates) are combined by calculating mean and standard deviation.Outliers (outside +/−two standard deviations) are flagged for manualinspection). Single channel arrays are pre-processed in a similar waybut without taking the ratios. This preprocessing sequence was shown toprovide good results for all preliminary data analyzed.

The step of selecting the most informative markers is used to identifythe most informative phages out of the large set of phages started with.The better the selection, the better is the expected accuracy of thediagnosis tool.

A first test is necessary to determine whether a specific epitope issuitable for inclusion in the final set to be spotted. The selectionmethods to be applied follow the principles of the methods successfullyapplied in (Golub et al., 1999; Alizadeh et al., 2000) and can bebriefly described in the following.

Procedure 1

The procedure is initiated defining a template for the cancer case (FIG.8). Unlike gene expression experiments where the expression level of agene can be either up or down in cancer vs. healthy subjects, here oneis testing for the presence of antibodies specific to cancer were testedfor. Therefore, epitopes with high reactivity in controls and lowreactivity in patients are not expected and the profile to the left inFIG. 8 is sufficient. Each epitope can have a profile across the givenset of patients (FIGS. 9 A and B). The profile of each epitope iscompared with the templates using a correlation-based distance. Thoseskilled in the art can recognize that the other distances may be usedwithout essentially changing the procedure.

The epitopes are then ordered based on the similarity between thereference profile (FIG. 8) and their actual profile. FIG. 7 shows 46epitopes found informative for a correlation threshold of 0.8. The finalcutoff threshold is calculated by doing 1000 random permutations oncethe whole data set become available. Each such permutation movesrandomly the subjects between the ‘patient’ and ‘control’ categories.Calculating the score of each epitope profile for such permutationsallows us to establish a suitable threshold for the similarity (Golubet. al. 1999).

The technique follows closely the one used in (Golub, 1999). However,the technique can be further improved as follows. Firstly, thistechnique was shown to provide good results if most controls areconsistent by providing the same type of reactivity. However,preliminary data showed that there are control subjects that show anon-specific reactivity with all clones (see FIG. 1 b). While stillclearly different from patients. FIG. 8 shows a comparison between thehistogram profile of a control subject showing a non-specific reaction(19218) with and the profile of a patient (19223). FIG. 9 shows thescatterplots of the same subjects. While still clearly different frompatients, such control subjects with a high non-specific reactionintroduces spikes in the clone profile in the area corresponding to thecontrol subjects (right left hand side of the template in FIG. 8). Suchspikes decrease the score of the relevant clones making them moredifficult to distinguish from the irrelevant ones. In order to reducethis effect, all control subjects with a non-specific response (i.e aunimodal distribution such as in the left panel of FIG. 7) wereeliminated from the analysis leading to the epitope selection.

A second essential modification is related to the set of epitopesselected. There are rare patients who might react only to a small numberof very specific epitopes. If the selection of the epitopes is done onstatistical grounds alone, such very specific epitopes can be missed ifthe set of patients available contains only few such rare patients. Inorder to maximize the sensitivity of the penultimate test resulted fromthis work, every effort was made to include epitopes which might be theonly ones reacting to rare patients. In order to do this, theinformation content of the set of epitopes is maximized while trying tominimize the number of epitopes used using the following procedure.

Procedure 2

Assume there are m patients and k controls. Select n random patientsfrom the m available. For each of the n patients used for epitopeselection, amplify (n×4 biopannings) and do self-reactions. Eliminatethose patients/epitopes that do not react to self.

Make a chip with all available, self-reacting epitopes printed inquadruplicates. React this chip with all patients and controls (n+kantibody reactions). Eliminate controls with a non-specific reactivity.For the set of epitopes coming from a single patient, apply Procedure 1to order the epitopes in the order of their informational content andselect the ones that can be used to differentiate patients fromcontrols.

Order the epitopes by their reactivity in decreasing order of the numberof patients they react to. Scan this list from the top down, movingepitopes from this list to the final set. Every time a set of epitopescoming from a patient x is added to the final set, the patient x and allother patients that these epitopes react to are represented in thecurrent set of epitopes. Repeat until all patients are represented inthe current set of epitopes.

This procedure tries to minimize the number of epitopes used whilemaximizing the number of patients that react to the chip containing theselected epitopes.

The following example shows how this procedure works using a simpleexample. The matrix in FIG. 10 contains a row i for the clones comingfrom patient i and a column j for the serum coming from patient j. Aserum is said to react specifically with a set of clones if thehistogram of the ratios is bimodal (see subject 19218 in FIGS. 8 and 9).A serum is said to react non-specifically if the histogram of the ratiois unimodal (see subject 19223 in FIGS. 8 and 9). Furthermore, a serummight not react at all with a set of clones. If the serum from patient jreacts specifically with the clones from patient i, the matrix cancontain a value of 1 at the position (i, j). The element at position (i,j) is left blank if the there is no reaction or the reaction isnon-specific.

Each set of epitopes corresponding to a row of the matrix is pruned bysub-selecting epitopes according to Procedure 1. The rows are now sortedin decreasing reactivity (number of patients other than self that theclones react to). For instance, in FIG. 11, the clones from patient 2react with sera from self (column 2) and patients 4 and 8. The clonesfrom patient 3 react with sera from self (column 3) and patients 6 and10, etc. The final set of clones was obtained from patients 2, 3, 5, 7and 1 (reading top-down in column 1). Clones coming from patients 8, 9and 10 are not included since these patients already react to clonescoming from other patients. This set ensures that the chip made withthese clones reacts with all patients in this example.

Procedure 3

Arrays using two channels such as Cy5 for the human IgG and Cy3 for theT7 control are processed as follows. The spots are segmented and themean intensity is calculated for each spot. A mean intensity value iscalculated for the background, as well. A background corrected value iscalculated by subtracting the background from the signal. The valuescoming from each channel are normalized by dividing by their mean.Subsequently, the ratio between the IgG and the T7 channels arecalculated and a logarithmic function is applied. The values coming fromreplicate spots (spots printed in quadruplicates) are combined bycalculating mean and standard deviation. Outliers (outside +/−twostandard deviations) are flagged for manual inspection. Someone skilledin the art can recognize that various combinations and permutations ofthe steps above or similar could replace the normalization procedureabove without substantially changing rest of the data analysis process.Such similar steps include without limitation taking the median insteadof the mean, using logarithmic functions in various bases, etc.

The histogram of the average log ratio is calculated. If the histogramis unimodal (e.g subject 19223 in FIG. 7), there is no specificresponse. If the histogram is clearly bimodal (e.g. subject 19218 inFIG. 7), there is a specific response. All 25 subjects analyzed so farfell in one of these two categories or had no response at all. A mixedprobability model is used in less clear cases to fit two normaldistributions as in (Lee, 2000). If the two distributions found underthe maximum likelihood assumption are separated by a distance d of morethan 2 standard deviations (corresponding to a p-value of approximately0.05), there is a specific response. If the distance is less than 2standard deviations, the response can be considered as not specific. Thepreliminary data analyzed so far showed a very good separation of thedistributions for the patients.

Once the chosen clones are spotted on the final version of the array, anumber of sera coming from both patients and controls can be tested.These sera come from subjects not used in any of the phases that lead tothe fabrication of the array (i.e. not involved in clone selection, notused as controls, etc.). Each test was evaluated using Procedure 3above. The performance on this validation data can be reported in termsof PPV, NPV, specificity and sensitivity. Since these performanceindicators are calculated on data not previously used, they provide agood indication of the performance of the test for screening purposesfor the various categories of patients envisage in the generalpopulation.

The present invention also provides a kit including all of thetechnology for performing the above analysis. This is included in acontainer of a size sufficient to hold all of the required pieces foranalyzing sera, as well as a digital medium such as a floppy disk orCDROM containing the software necessary to interpret the results of theanalysis. These components include the array of clones or peptidesspotted onto a solid support, prewashing buffers, a detection reagentfor identifying reactivity of the patients' serum antibodies to thespotted clones or peptides, post-reaction washing buffers, primary andsecondary antibodies to quantify reactivity of the patients serumantibodies with the spotted array and methods to analyze the reactivityso as to establish an interpretation of the serum reactivity.

A biochip for detecting the presence of the disease state in a patient'ssera is provided by the present invention. The biochip has a detectorcontained within the biochip for detecting antibodies in a patient'ssera. This allows a patient's sera to be tested for the presence of amultitude of diseases or reaction to disease markers using a singlesample and the analysis can be conducted and analyzed on a single chip.By utilizing such a chip this lowers the time required for the detectionof disease while also enabling a doctor to determine the level ofdisease spread or infection. The chip, or other informatics system canbe altered to weigh the results. In other words, the informatics can bealtered to adjust the levels of sensitivity and/or specificity of thechip.

The present invention is well suited for providing useful informationregarding the efficacy of pharmaceuticals at treating disease.Specifically, the present invention is well suited in measuring theeffects of drugs and other medications based on the above-identifiedmarkers. The present invention determines the minimum level of apharmaceutical needed to achieve therapeutic benefits. Thus, the presentinvention is useful in determining effective treatment of variousdiseases and illnesses. The results of the analysis can be utilized todetermine if the treatment is effective or if such treatment needs to bealtered.

Further, the treatment can be altered based upon the markers detected.For example, the treatment can be specifically designed based upon themarkers identified. In other words, the therapy can be altered to mostsuitably treat the identified markers, such that the treatment isdesigned to most efficiently treat the identified marker. The ability toadjust the therapy enables the treatment to be tailored to the personbeing treated's needs. The treatments that can be used range fromvaccines to chemotherapy.

The markers of the present invention can also be used forimmuno-imaging. Immuno-imaging is a process in which antibodies to aspecific antigen are labeled such that the label can be detectedexternally. Examples of externals detectors include, but are not limitedto, x-rays, MRI, CT scan, and PET scans. The imaging functions becausean imaging reagent containing the labeled antibody is administered to apatient.

The above discussion provides a factual basis for the use of thecombination of markers and method of making the combination. The methodsused with a utility of the present invention can be shown by thefollowing non-limiting examples and accompanying figures.

EXAMPLES Example 1

The purpose of this study is to clone epitopes that are recognized bysera from women with ovarian cancer but not recognized by normal serafrom unaffected women. As these epitopes are cloned, protein arrayassays are developed capable of detecting ovarian cancer at an earlystage by analyzing antigens recognized in the sera of at risk women.Toward this end, individual sera were screened using these proteinbiochips to determine the antibody reactivity to each protein epitope.Antibody reactivity is detected that does not appear in control sera.The patients and control sera obtained for this study were used tocalibrate the protein biochips and identify the most informativeepitope-clones. The women were monitored for the appearance orreappearance of antibody reactivity and its correlation with tumorburden. By following the serum reactivity to tumor reactive new epitopeson the arrays of the phage display cDNA clones, the analysis of serafrom women after their initial diagnosis and semiannually thereafterallows the determination of the markers in predicting tumor recurrence.

Some of the markers can be predictive of recurrence, and thus can beused to correlate specific ovarian tumor types (using the World HealthOrganization Histological Classification of Ovarian Tumors), also thetumor grade (where appropriate, since not all tumors all graded), andthe surgical stage. This can be done by review of the pathologicalmaterial (glass slides, patient records, and surgical pathologyreports). Certain currently accepted biomarkers of research interestsuch as Her-2 neu and other can also be included in the new proteinbiochips in order to compare the sensitivity and specificity of the newand existing immunohistochemical technologies. Testing for Her-2 neu andother biological markers is done by the immunoperoxidase method usingformalin fixed, paraffin embedded tumor tissues.

For the purpose of comparison to the ovarian cancer patients, one cananalyze serum markers in women in good health who do not have ovarian orany other type of cancer. These control subjects should not have afamily history of ovarian cancer or breast cancer. Because some serummarkers such as CA125 levels are increased in endometriosis, uterineleiomyoma, pelvic inflammatory disease, early pregnancy, and benigncysts, control subjects should be free of these conditions as well.

The series of experiments provides direct evidence that biopanning a T7coat protein fusion library can isolate epitopes for antibodies presentin polyclonal sera. This also showed that the technology can be appliedto direct microarray screening of large numbers of the selected phageagainst numerous patient and control sera. This approach provides alarge number of biomarkers for early detection of ovarian cancer. Thelikelihood of success of this approach is increased by the fact that themRNA for human Sirt2 is present in cells at very low abundance in humanbrain RNA thus indicating that clones can be isolated for rare RNAtranscripts by this approach.

To further demonstrate the feasibility of these methods for differentialdetection of epitopes between test and control sera, four cycles ofbiopanning of a commercial Novagen breast tumor cDNA library wereperformed using a serum sample from a breast cancer patient and acontrol serum sample from a woman without cancer. 100 plaques werepicked from each biopanning. Analysis of 100 plaques from the initiallibrary and each successive biopanning were amplified in microtitreplates and the lysates cleared by centrifugation. One half microliter ofeach sample was spotted onto nitrocellulose filters and immunodetectionperformed using the breast cancer patient serum at 1:20,000 dilution(FIG. 5). Clear enrichment during biopanning is seen as was observedabove with the anti-Sirt2 rabbit serum. As seen in FIG. 6 (usingrandomly picked plaques from BP 4) the filters contacted with thecontrol serum on the left panels demonstrate weaker spot intensity ascompared to a duplicate filter of the same clones on the right that wascontacted with the patient serum. Approximately 65% of the phageselected for reactivity to the patient's serum were more than 3-foldmore reactive with the patient's serum than with the control serum asdetermined by scanning densitometry.

FIG. 6A shows a comparison of serum reaction of control and breastcancer patient with phagotopes from BP4. FIG. 6B shows the BP4 filtersthat were scanned and the ratio of the pixel densities plotted in rankorder.

This experiment demonstrates that one can differentially detect theepitopes for which the process is selecting, i.e. those bound to proteinG-agarose beads in association with antibodies in the patient's serumand not the control serum. Someone skilled in the art can recognize thatother solid supports for biopanning could replace the protein-G beadswithout substantively changing the biopanning process. These data alsoindicate that the selection is imperfect. Not all of the selectedphagotopes are more reactive with the patient's serum that the controlserum. Therefore, the identification of the most informative phagotopesrequires analysis of the reactivity with multiple, individual patients'sera tested at various serum dilutions.

The immune reactivity to human tumors recognizes changes in theexpression levels and mutation status of proteins in the tumor cells.These types of immunological reactivity are not observed in sera fromcontrol subjects. The antibody titer to tumor specific epitopes can beproportional to the tumor burden. The immune reactivity to human tumorscan be used diagnostically and prognostically to predict the presenceand behavior of human tumors such as tumor recurrence. Serum reactivityto single proteins tends to incompletely identify tumor bearing patientsand therefore more robust methods are necessary to accurately identifytumor occurrence and recurrence. Whole genome-based proteomics such asthe technology and data analysis methods embodied in the application canmore comprehensively identify those proteins recognized by the hostimmune system.

Those of skill in the art are familiar with the construction of cDNAlibraries and there are numerous published numerous papers on isolationof cDNAs from human cells in culture using this technology (Chiao, etal., 1992; shin et al., 1993; Buettner et al., 1993; Kim et al., 1996;Deyo et al., 1998; Bauer et al 1998). cDNA libraries can be preparedfrom ovarian cancer cell lines or from ovarian tumor tissue. Tumortissue cDNA library can be prepared from a pool of mRNA preparationsfrom each of the different stages of cancer to increase the diversity ofclones in the library.

Methods

mRNA from one ovarian cancer cell line, SKOV3 and ovarian tumor tissues,was copied into cDNA and libraries prepared. Tumor tissue in excess ofthat needed for pathological evaluation was obtained by informed consentfrom ovarian cancer patients.

Sera was obtained from 1) ovarian cancer patients at the time ofdiagnosis and at six month intervals during the follow up physicianvisits; 2) unaffected women for control sera.

T7 cDNA phage display expression libraries are prepared for biopanningexperiments, to select phage bearing epitopes ie phagotopes that arerecognized by sera from women with ovarian cancer but not recognized bynormal sera from unaffected women. For the biopanning process, sera fromwomen in the control group was pooled to avoid individual variationsunrelated to the presence of ovarian cancer.

The selection of the most informative epitopes was done by comparing theimmune reaction profile of each individual epitope with templatesdefined for each disease stage. Several distances and informationentropy measures were used. Several predictors were constructed based onthree selected machine learning techniques using only a part of theavailable data. Specificity, sensitivity, positive predicted value andnegative predicted value were calculated for each such classifier. Thevalidation of the predictors and the selection of the best predictor wasdone by cross-validation on cases that have not been used during thepredictor construction.

For example, to develop an effective screening test for early detectionof ovarian cancer, cDNA phage display libraries were used to isolatecDNAs coding for epitopes reacting with antibodies present in the seraof patients with ovarian cancer. Screening of T7 phage cDNA library withserum containing polyclonal antibodies against a known protein, leads tothe enrichment of one particular phage clone (which displays the peptidesequence recognized by the antibody on its coat) after several rounds ofbiopanning. Serum containing polyclonal antibodies were raised against aC-terminal 12 amino acid peptide from the human homologue of the yeastSIRT2 protein and screened against a T7 phage human brain cDNA library.This library was used because the Sirt2 transcript is expressed in humanbrain. Preimmune rabbit serum was bound to protein-G agarose beads and6×10¹⁰ phage were added to the beads. The unbound phage were then boundto protein-G agarose beads to which the Sirt2p antibody was previouslybound. The nonspecifically bound phage were washed away with PBS and thespecifically bound phage eluted with 1% SDS. T7 phage is stable in thissolution. These phage are diluted to reduce the SDS concentration andused to infect bacteria for amplification and another cycle ofbiopanning. Table 1 shows the value of the titer of the T7 phage libraryafter each cycle of biopanning. This table reveals that the titer of theeluate after each round of biopanning increased with each successivecycle of antibody selection.

E. coli BLT5615 infected with amplified phage library after biopanning1-4 were plated onto LB-Agar plates and plaque lifts were performed forall the individual plates. The plaque lift filter membranes were thenhybridized with a P³²-labeled Sirt2 cDNA probe. The percentage ofpositive plaques (number of positive plaques/total number ofplaques×100) as determined for each plates labeled BP1-4, FIG. 1increased with each successive cycle of biopanning. For BP1 and BP2 thepercentage of positive plaques was negligible. For BP3 and BP4,percentage of positive plaques was 1.7% and 8.6% respectively.

In order to confirm that those positive plaques contain phage clonesdisplaying the peptide sequence of Sirt2, 50 plaques were randomlypicked up and PCR amplified each insert using T7 coat protein forwardprimer (5′TCTTCGCCCAGAAGCTGCAG3′) and T7 coat protein reverse primer(5′CCTCCTTTCAGCAAAAAACCCC3′). Filter hybridization was performed usingthe same Sirt2 cDNA probe as above. As shown in FIG. 2, 7 out of 50plaques (14%) hybridized to the Sirt2 probe, a frequency similar to thatobserved in the plaque lifts. Plaques positively reacting with the Sirt2probe were picked and also hybridized on Southern Blots of PCR product.

Sirt2 positive plaques (upper two rows) and Sirt2-negative plaques(lower two rows) were chosen and 1 μl (pfu indicated at left) of eachamplified phage clone was spotted onto the nitrocellulose membraneswhich were then treated as if they were standard immunoblots using therabbit polyclonal Sirt2 antibody (right panel) or a mouse monoclonalantibody to the T7 capsid protein (left panel). The rabbit polyclonalantibody provides a sample for testing as if it were a patient's serumusing the Sirt2 protein as a model. The Sirt2 antibody in the rabbitpolyclonal serum reacted specifically with the Sirt2 phage. The identityof the phage was confirmed by direct PCR sequence analysis of the cDNAinserts in two independent Sirt2 positive phage. Thus phage expressingthe epitope to which the antiserum was directed were isolated anddistinguished from other phage.

Microarrays were spotted using Sirt2 T7 clones and other T7 clones thatdo not express Sirt2. These arrays were used to analyze a mixture ofCy5-labeled (red) rabbit Sirt2-immunized serum and Cy3-labeled (green)T7 coat protein antibody (Novagen) added to the pre-immune rabbit serum.The scanned two-color image clearly shows specific detection of theSirt2-expressing T7 clones by the anti-Sirt2 antibody. The Sirt2expressing clones appear yellow because they bind both the red-labeledantibody to a rabbit immunoglobulin G protein and the green-labeledanti-T7 capsid 10B antibody. The non-Sirt2-expressing T7 clone are greenas they only bind to the Cy3-labeled anti-T7 antibody. This developmentof detection of protein epitopes in bacteriophage bodes well for theapplicability of phage arrays to the detection of low abundance speciesand weak binders. The spots in the image are approximately 100 micronsin diameter.

The following is an example of the preparation of a tumor reactive cDNAexpression library: Ovarian cancer cells were grown in monolayerculture. Cells or fresh tumors from patients were lysed by the additionof 3 ml of TRIZOL reagent and the homogenized sample was incubated forfive minutes at room temperature. Chloroform, 0.6 ml, was added and themixture was shaken vigorously for 15 seconds and then incubated at roomtemperature for 2-3 minutes. The extract was centrifuged at 12,000×g for30 minutes at 4° C. Following centrifugation, the mixture was separateda lower red, phenolchloroform phase, an interphase, and a colorlessaqueous phase. Aqueous phase was transferred to a fresh tube and totalRNA was precipitated by adding 1.5 ml of isopropanol. The mixture wasincubated at room temperature for ten minutes and was centrifuged at12,000g for 30 minutes at 4° C. The supernatant was discarded and theRNA pellets were washed by adding 3 ml of 75% ethanol. The samples werecentrifuged at 14,000×g for 15 minutes. The RNA pellet was air-dried andwas dissolved in RNase-free water.

mRNA was isolated from total RNA following Oligotex mRNA spin columnprotocol. Total RNA, 0.5 mg, was dissolved in 500 μl of RNase-free waterand 500 μl of binding buffer and 30 μl of Oligotex suspension was added.The contents were mixed thoroughly, incubated for three minutes at 70°C. in a water-bath, and then at room temperature for 10 minutes. TheOligotex:mRNA complex was pelleted by centrifugation for 2 minutes at14,000×g and the supernatant was discarded. The Oligotex:mRNA pellet wasresuspended in 400 μl washing buffer by vortexing and pipetted onto aspin column placed in a 1.5 ml microcentrifuge tube. The samples werecentrifuged at maximum speed for one minute and the flow-throughdiscarded. The spin column was transferred to a new RNase-free 1.5 mlmicrocentrifuge tube. Elution buffer at 70° C. was then added to thecolumn. Poly (A)⁺ mRNA was eluted, quantitated by UV spectroscopy andthe process of poly A selection repeated one more time to further reducecontamination with ribosomal RNA. Twice poly A selected mRNA was storedat −70° C. for use in library preparation.

Novagen's OrientExpress cDNA Synthesis and Cloning systems were used forthe construction of ovarian cancer cDNA T7 phage libraries. Forfirst-strand cDNA synthesis, OrientExpress Random Primer System was usedto ensure representation of both N-terminal and C-terminal amino acidsequences.

Ten ml of LB/carbenicilln medium were inoculated with a single colony ofBLT5615 from a freshly streaked plate. The mixture was shaken at 37° C.overnight. Ten ml of the overnight culture was added to 90 ml ofLB/carbenicillin medium and was allowed to grow until OD₆₀₀ reaches0.4-0.5.IPTG (1 mM), M9 salts (1×) and glucose (0.4%) can be added andthe cells were allowed to grow for 20 minutes. An appropriate volume ofculture was infected with phage library at MOI of 0.001-0.01 (100-1000cells for each pfu). The infected bacteria were incubated with shakingat 37° C. for one to two hours until lysis is observed. Glycerol(0.02%), PMSF (0.02M) was added to the cell lysate to block proteolysisof the capsid fusion proteins. The phage were centrifuged at 8000×g for10 minutes. The supernatant was collected and was stored at 4° C. Thelysate was titered by plaque assay under standard conditions. Thelibraries are stored after purification by polyethylene-glycolprecipitation and ultracentrifugation through a stepwise CsCl gradient.

Using this approach, applicants have constructed the first library.Using twice poly A selected mRNA from SKOV3 cells a T7 select cDNAlibrary was prepared containing 1.8×10⁷ initial plaques after packaging.This representation is comparable to the clonal representation of thecommercial libraries purchased. This library has been amplified andstored in aliquots in two −70° C. freezers.

Patients' sera were obtained from multiple institutions for thisproject. Three outside institutions have agreed to provide ovariancancer patient sera and the associated medical record information inanonymized form. Dr. Steven Witkin from the Weill Medical College ofCornell University provided 46 patient serum samples and 27 controls.Dr. Karen Lu from the M.D. Anderson Cancer Center can provide 60 serumsamples. Dr. David Fishman from the Northwestern UniversityComprehensive Cancer Center provided 35 serum samples of patients whohave been followed from time of diagnosis.

The ideal sera for the clone biopanning studies come from women justbefore or after surgery and prior to chemotherapy. Follow up sera wereobtained after chemotherapy and are important to determine whether thepenultimate protein array technology can detect tumor recurrence.

In addition, a supply of tumor tissue was required for the preparationof mRNA for cDNA library production and gene expression studies usingsamples from patients. This tissue was harvested within 20 minutes ofsurgical excision from the patients. This requires the coordinatedeffort of the gynecologic surgeons and pathologists. Patients at thetime of their original surgery or prior to chemotherapy were accrued forserum collection. If tumor tissue is available in excess of that neededfor routine pathologic evaluation, that tissue was used for RNApreparation for mRNA expression studies associated with this study.Sections from tissue blocks were also acquired for the purpose ofexpression studies of proteins in the patients' tumors. Patients atfollow up visits to the OB/GYN clinics were also subjects for serumacquisition. These latter patients can be at a time of recurrence ornot. This allows the observation of the reappearance of serum markers inthe event of tumor recurrence. Serum was obtained from eligiblepatient-subjects during scheduled clinic visits. The initial serumacquisition occurs prior to surgery, if possible, or if post surgery,prior to chemotherapy. A single red top 7 cc vial of blood was obtainedduring normal phlebotomy and the serum isolated after clotting. Serumcontinues to be collected from these patients during follow up visitsfor up to five years or until ovarian cancer recurrence. Tumor tissue inexcess of that required for pathological analyses were acquired at thetime of surgery for the preparation of tumor RNA needed for antibodyscreening. Unaffected volunteers (controls) were be recruited throughcommunity outreach activities.

The Biopanning Process

Steps in the Biopanning Process:

Affinity selection with sera from normal individuals: Twenty-five μl ofProtein G Plus-agarose beads were taken in 0.6 μl eppendorf tube andwere washed two times with 1×PBS. Washed beads were blocked with 1% BSAat 4° C. for one hour. The beads were then incubated at 4° C. for onehour with 250 μl of pooled sera at a dilution 1:20 from 20 controlwomen. After three hours of incubation, beads were washed three timeswith 1×PBS and then incubated with phage library (˜10¹⁰ phageparticles). After incubation, the mixture was centrifuged at 3000 rpmfor two minutes to remove phage nonspecifically bound to the beads andthe supernatant (phage library) was collected for immunoscreening.

Fresh protein G Plus agarose beads were placed into a 0.6 ml eppendorftube and were washed two times with 1×PBS. Washed beads were blockedwith 1% BSA at 4° C. for one hour. The beads were then incubated at 4°C. for three hours with 250 μl of sera at a dilution 1:20 from patientswith ovarian cancer. After this incubation, the beads were washed threetimes with 1×PBS and were incubated with phage library supernatant fromabove (termed as Biopanning 1 (BP1)) collected for immunoscreening at 4°C. for overnight (shorter times of incubation have not proven successfulusing model antibody systems). After incubation, the mixture wascentrifuged at 3000 rpm for two minutes and supernatant can bediscarded. Beads were washed three times with 1×PBS. To elute the boundphage 1% SDS was added to the washed beads and the mixture was incubatedat room temperature for ten minutes. The bound phage were removed fromthe beads by centrifugation at 8000 rpm for seven minutes. Eluted phagewere transferred to liquid culture for amplification (100 μl elution to20 ml culture). Four rounds of affinity selection and immunoscreeningwas carried out with amplified phage obtained after each biopanning. Thenumber of biopanning cycles generally determines the extent of theenrichment for phage that binds to the sera of patient with ovariancancer. This process allows for one cycle of biopanning to be performedin a single day.

In the past serum markers have been identified using SEREX technologythat detected only a few gene products at a time. The biopanningapproach developed can isolate large numbers of target epitopes. Theseepitopes are displayed on the surface of bacteriophage as in-framefusion proteins with the T7 phage capsid protein and can be analyzed inlarge numbers by arraying the selected phage on filter paper or glassslides (protein microarrays). The method isolates large numbers of phagethat react with antibodies from pooled patient sera but not with normalsera.

The titer of the T7 phage library obtained after amplification of eachBiopanning (BP1-4) eluate was determined by plaque assay. E. Coli BLT5616 were infected with the primary unamplified phage from biopanning(BP3-4) and plagued to limiting dilution onto LB/carbenicillin plates(150 mm×15 mm petri dish) so that sufficient numbers of single plaquescan be isolated to obtain 12×96 well plates for arraying. The plateswere incubated at 37° C. for 3-4 hours until the plaques are visible andthen picked for amplification in the 12×96 well plates. After two hours,lysis of the host bacteria occurs in the wells of the 96-well plates.One well of each plate was uninfected as a control. Five 96 well platesof 200 μl phage lysates are clarified by centrifugation of the phage.The phage were cleared by whole plate centrifugation before roboticspotting in triplicate onto filters or glass slides. Excess reactivityin the surface area of the slide not spotted with phage is blocked usingBSA, 1% solution in PBS for 60 minutes, followed by washing in waterthree times. After blocking the arrays on glass slides or filters wereblocked with 1% BSA in PBS and incubated with a various dilutions ofeach of the individual controls and patient's sera spotted in triplicateor more for each dilution of serum. Serum antibodies binding torecombinant proteins expressed in the surface of the T7 bacteriophagewere detected by incubation a Cy5-labeled anti-human IgG goat antiserumand visualized and quantified using GenePix and ImaGene software in a4000B array scanner (AXON Instrument). As positive control for each spota Cy3-labeled antibody for the T7 capsid protein was used. The ratio ofthe fluorescence intensity for the human antibodies were normalized tothe T7 capsid antibody reactivity. Initial testing of phage solutionswere performed on a spotting robot.

The optimal number of subtractive biopannings for each serum sample isdetermined by picking individual phage clones, and then testing theantibody reactivity for the serum used in the biopanning against thoseclones, (referred to as its self reaction). Plates of 96 clones werepicked for each patient's biopanning at cycles 3, 4, and 5 which werethen tested for the binding of the phage clones to antibodies in thatserum, in a “self-reaction”. Antibody binding is detected by spottingthe filters with a 96 pin head on a Biomek robot or detected on glassslides of microarrays of phagotopes. The filters are then treated like awestern blot by blocking with 1% dry milk powder in PBS and addingdiluted serum. After rocking for 2 hours the filter is washed andreacted with an anti-human IgG antibody link to horseradish peroxidase(HRP) and detected by ECL. From the clones isolated from one patient,(designated patient #1) a total of 480 plaques were picked from thatserum at biopanning 4. Biopanning four was chosen because about 35% ofthe clones bound antibodies from that patient's serum. Serum reactivityof the phagotopes with the patient's serum was detected at a 1:10,000dilution indicating a very high titer of the IgG molecules that reactwith the epitopes (self reaction with 480 clones). Reactivity to theseclones is detected at similar dilutions using the clones arrayed onglass slides as an alternative solid support.

When the serum reactivity with other patients (non-self reactions) wasanalyzed using replicates of the robotically spotted filters, reactivitywas found in some patients again at a dilution of 1:10,000 (FIG. 1 b).Other patients required a 1:3000 dilution of the serum for detection ofthe reactive clones Table 1). Patients #23 reacted quite strongly whilepatient #16 reacted more weakly (FIG. 1 b and Table 1). Positivity wasscored only when 3 out of 3 of the triplicates have similar intensity.In the subtractive biopanning scheme plaques binding to normal serumproteins nonspecifically were removed by loading protein-G beads with apool of control sera. One can detect positive reaction on filtersspotted with phage epitope clones on filter 13 of 21 other patientsusing 153 reactive clones of the original 480 clones. Filters weretested with control sera not used in the initial subtractive step, and 5of the 8 controls showed no reaction to the 480 phage on the filterarrays while a non-specific and even pattern of reactivity to all clones(without the typical triplicate pattern) was observed using 3 of the 8different control sera (Table 1).

# of phage Patient #1 BP4 clones reacted with each patient′s sera atindicated dilution Patient′s sera 1:10000 1:3000 PATIENT 1 153 (selfreaction) PATIENT 2 None 142 PATIENT 16 NS PATIENT 20  70 PATIENT 23 137PATIENT 29 NS PATIENT 30 NS PATIENT 33 NS PATIENT 35 NS 72 PATIENT 37None 120 PATIENT 01-056 NS PATIENT 01-060 None 61 PATIENT 00-007 NSPATIENT 01-108 NS PATIENT 01-045 NS PATIENT 42501  40 PATIENT 400162 120PATIENT 40036 Mostly NS PATIENT 42780  85 PATIENT B755 NS PATIENT 40015NS PATIENT 075 119 PATIENT 015 155 PATIENT 035 NS PATIENT 007 114PATIENT 005 133 PATIENT 083 150 PATIENT 054  92 PATIENT 064 NS PATIENT065 NSTable 2. NS indicates Non-Specific reaction only: None indicates Noreaction detected.

The filter arrays are incubated with a patient's serum (pretreated with150 μg of bacterial extract to block nonspecific reactions with E. coliproteins for 2 hours at 4° C.) at various dilutions for 1 hour at roomtemperature. Bacterial extracts are used because some patients haveantibodies to bacterial protein, and therefore pre-treatment withextracts of E. coli proteins blocks the nonspecific antibodies tobacterial protein present in the patient's serum. The membranes are thenwashed three times with TBST (0.24% Tris, 0.8% NaCl, and 1% Tween-20)for 15 minutes each. After washing is completed, the membranes areincubated with secondary antibody, goat-anti human IgG-HRP conjugated(Pierce) at 1:5000 dilution for 1 hour at room temperature. Themembranes are again washed three times with TBST 15 minutes each.Finally, membranes are developed with Supersignal West Picochemiluminescent substrate (Pierce) and the images were captured on aKodak film.

Phagotope Microarrays on Glass Biochips Preparation of Arrays

Phage lysates are prepared as above. Phage lysates (usually five 96 wellplates) from BP4 are transferred to 384-well plates, each lysate spottedin quadruplicate, using 10 μl per well. A robotic microarrayer is usedto spot the phage in an ordered array onto FAST™ slides (Schleicher &Schuell) at a 350 μm spacing using 4 steel Micro-Spotting Pins. Thearrays are dried overnight at room temperature.

Preparation of Fluorescent Antibody Probes

T7 monoclonal antibody and goat anti-human IgG are purchased fromNovagen and Pierce respectively. Monofunctional NHS-ester activated Cy3and Cy5 dyes are purchased from Amersham (PA33001 and PA35001). Theantibodies are labeled in pH 8.0 sodium carbonate buffer as per theinstructions from the manufacturer. Briefly, 100 μl of the proteinsolution with 5 μl of coupling buffer is transferred to the vial ofreactive dye and mixed thoroughly. The reaction is incubated in the darkat room temperature for 30 minutes with additional mixing approximatelyevery 10 minutes. The reaction solutions are then loaded into the gelfiltration columns to separate the labeled protein from non-conjugateddye. T7 antibody is labeled by Cy3 and anti-human IgG is labeled by Cy5,respectively. The labeled protein is eluted and stored at 4° C. forfuture use. Reversing the dye-labeling scheme of the antibodies does notaffect the results. The advantage of this strategy is that the samereagents were used on every phagotope array and the only variable is thepatient's serum and therefore variations in labeling efficiency are nota factor.

Detection of Fluorescent Antibody Probes

The arrays are rinsed briefly in a 1% BSA/PBS to remove unbound phage,transferred immediately to 1% BSA/PBS as a blocking solution, and thenincubated in this blocking solution for 1 hour at room temperature. Theexcess BSA is rinsed off from the slides using PBS. Without allowing thearray to dry, 2 ml of PBS containing human serum at a dilution of1:10,000 is applied to the surface in a screw-top slide hybridizationtube. Multiple dilutions are tested per patient to obtain optimaldetection. The arrays are incubated at room temperature for 1 hour withmixing. The arrays are rinsed in PBS to remove the serum, and thenwashed gently three times in PBS/0.1% Tween-20 solution 10 minutes each.All washes are performed at room temperature. After removing Tween-20using PBS, the arrays are incubated with 2 ml of PBS containingCy3-labeled-T7 anti-capsid antibody at a dilution of 1:50,000 andanti-human IgG labeled with Cy5 at a dilution of 1:10,000 as probes for1 hour in the dark. The incubation solution is mixed every 20 minutes.Three washes are performed using PBS/0.1% Tween-20 solution with 10minutes each. The array is then rinsed with filtered ddH₂O twice anddried using a stream of compressed air.

Analysis Phagotope Microarrays

The arrays are scanned in an Axon Laboratories scanner (AxonLaboratories, Palo Alto, Calif.) using 532 nm and 635 nm lasers. Theratio of anti-T7 capsid and anti-human IgG is determined by comparingthe fluorescence intensities in the Cy3- and Cy5-specific channels ateach spot. The location of each spot on the array is outlined using theimage processing software. The background, calculated as the median ofpixel intensities from the local area around each spot, is subtractedfrom the average pixel intensity within each spot. This normalizedreactivity is entered into a database for analysis.

The information in this database can be analyzed in order to: i) selectthe most informative epitopes and ii) develop into a diagnostic test fortumor occurrence in high-risk women or tumor recurrence in womenpreviously treated for ovarian cancer. The gene products thuslyidentified can provide insight into molecular changes recognized by thehost immune system.

The human antibodies reacting at each spot are detected with Cy5-labeledhuman serum antibodies. The normalization of the fluorescence at eachspot is compared to a reaction with a Cy3-labeled antibody to the T7phage capsid protein. Only a small fraction of the phage capsid proteinis substituted with the in-frame fusion of the human cDNAs of thelibrary. The majority of the capsid protein is produced by the hostbacterium from an episomic T7-capsid gene. Therefore the majority of theeach capsid protein is wild-type and can react with the anti-capsidantibody. An example of a Cy5 labeled anti-human IgG reacting with IgGin patients #1 serum bound to clones biopanned using patient #1 serum isshown in FIG. 6 c.

The data analysis proceeds according to the following steps:

1. Pre-processing and normalization.

2. Identifying the most informative markers

3. Building a predictor for molecular diagnosis of ovarian cancer andvalidating the results.

The pre-processing and normalization step is used for arrays using twochannels such as Cy5 for the human IgG and Cy3 for the T7 control. Thespots are segmented and the mean intensity is calculated for each spot.A mean intensity value is calculated for the background, as well. Abackground corrected value is calculated by subtracting the backgroundfrom the signal. If necessary, non-linear dye effects can be eliminatedby performing an exponential normalization (Houts, 2000) and/or apiece-wise linear normalization of the data obtained in the first round.The exponential normalization can be done by calculating the log ratioof all spots (excluding control spots or spots flagged for bad quality)and fitting an exponential decay to the log (Cy3/Cy5) vs. log (Cy5)curve. The curve fitted is of the form:

y=a+b*exp(−cx)

where a, b and c are the parameters to be calculated during curvefitting. Once the curve is fitted, the values are normalized bysubtracting the fitted log ratio from the observed log ratio.

This normalization has been shown to obtain good results for cDNAmicroarrays but it relies on the hypothesis that the dye effect can bedescribed by an exponential curve. The piece-wise linear normalizationcan be done by dividing the range of measured expression values intosmall intervals, calculating a curve of average expression values foreach such interval and correcting that curve using piece-wise linearfunctions.

The values coming from each channel are subsequently divided by the meanof the intensities over the whole array. Subsequently, the ratio betweenthe IgG and the T7 channels was calculated. The values coming fromreplicate spots (spots printed in quadruplicates) are combined bycalculating mean and standard deviation. Outliers (outside +/−twostandard deviations) are flagged for manual inspection). Single channelarrays are pre-processed in a similar way but without taking the ratios.This preprocessing sequence was shown to provide good results for allpreliminary data analyzed.

The step of selecting the most informative markers is used to identifythe most informative phages out of the large set of phages started with.The better the selection, the better is the expected accuracy of thediagnosis tool.

A first test (Procedure 1 disclosed above) is necessary to determinewhether a specific epitope is suitable for inclusion in the final set tobe spotted.

Procedure 2 is used to maximize the information content of the set ofepitopes while trying to minimize the number of epitopes used using thefollowing procedure.

The arrays used in this example, (using two channels such as Cy5 for thehuman IgG and Cy3 for the T7 control) are processed as follows. Thespots are segmented and the mean intensity is calculated for each spot.A mean intensity value is calculated for the background, as well. Abackground corrected value is calculated by subtracting the backgroundfrom the signal. The values coming from each channel are normalized bydividing by their mean. Subsequently, the ratio between the IgG and theT7 channels are calculated and a logarithmic function is applied. Thevalues coming from replicate spots (spots printed in quadruplicates) arecombined by calculating mean and standard deviation. Outliers (outside+/−two standard deviations) are flagged for manual inspection.

The histogram of the average log ratio is calculated. If the histogramis unimodal (e.g subject 19218 in FIG. 13), there is no specificresponse. If the histogram is clearly bimodal (e.g. subject 19223 inFIG. 13), there is a specific response. All 25 subjects analyzed so farfell in one of these two categories or had no response at all. Thepreliminary data analyzed so far showed a very good separation of thedistributions for the patients.

Once the chosen clones are spotted on the final version of the array, anumber of sera coming from both patients and controls can be tested.These sera come from subjects not used in any of the phases that lead tothe fabrication of the array (i.e. not involved in clone selection, notused as controls, etc.). Each test was evaluated using Procedure 3above.

Building the Predictor

A number of machine learning and statistical techniques have beenconsidered for this task. The following algorithms were tested: CN2(Clark, 1989), C4.5 (Quinlan, 1993; Breiman et al., 1984), CLEF 1998),4.5 using classification rules (Quinlan, 1993), incremental decisiontree induction (ITI) (Utgoff, 1989; quantization (LVQ) (Kohonen, 1988;Kohonen, 1995), induction of oblique trees (OC1) (Health and Salzberg,1993; Murthy, 1993), Nevada backpropagation (NEVP); Rumelhart et al.,1987), Constraint Based Decomposition (Draghici, 2001), k-nearestneighbors with k=5 (K5), Q* and RBF's (Musavi et al., 1992; Poggio andGirosi, 1990).

The generalization abilities and the reliability of these techniqueshave been tested extensively on various problems and data sets from theUCI machine learning repository (Blake et al., 1998). This repositorycontains a large collection of mostly real world data from a largevariety of domains (including biological and medical), and constitutes abenchmark on which various algorithms and techniques can be tested.

Table 2 presents the accuracies obtained by these techniques on theselected problems. Table 3 presents the standard deviation of each suchalgorithm on the same problems. Based on these tests applicant decidedto start the tests by using constraint based decomposition (CBD), radialbasis functions (RBFs) and decision trees (C4.5) as the three maincandidates. The CBD was selected because it offers a high reliabilityacross multiple trials (lowest standard deviation) and a good accuracy(second best). Furthermore, the CBD algorithm can also produce a logicalexpression describing the classifier produced. Such expressions allowone to understand the relative importance of various epitopes. Thedecision trees have been selected mainly because they can be mapped intological expressions that can be compared to the one produced by the CBD.RBFs construct clusters by placing high dimensionality Gaussianfunctions on groups of given data points (one data point can be a set ofexpression values corresponding to a protein chip). This techniquecalculates automatically the number of clusters, their orientation (theeigenvectors of the correlation matrix of the expression vectors) andtheir widths. RBFs were expected to perform much better than k-meansclustering and the other techniques already used in this context becauseRBFs avoid guessing (e.g. k in k-means clustering). Furthermore,extracting a model from the trained RBF architecture is straightforward.Again, this model can be compared with the models provided by the CBDand C4.5

DATASET C4.5 C4.5r ITI LMDT CN2 LVQ OC1 NEVP K5 Q* RBF CBD GLASS 70.2367.96 67.49 60.59 70.23 60.69 57.72 44.08 69.09 74.78 69.54 68.37IONOSPHERE 91.56 91.82 93.65 86.89 90.98 88.58 88.29 83.8 85.91 89.787.6 88.17 LUNG 40.17 39.84 38.47 55.49 37.17 55.71 54.28 33.12 68.54 6065.7 60 CANCER WINE 91.09 91.9 91.09 95.4 91.09 68.9 87.31 95.41 69.4974.35 67.87 94.44 PIMA 71.02 71.55 73.16 73.51 72.19 71.28 50 68.5271.37 68.5 70.57 68.72 INDIANS BUPA 65.14 65.39 63 71.54 64.31 64.1365.57 77.72 66.43 61.43 59.85 62.32 TICTAC 83.52 99.17 92.89 89.61 98.1865.61 78.56 96.91 84.32 65.7 72.19 75.1 TOE BALANCE 64.61 75.01 76.7693.27 80.89 89.54 92.5 91.04 83.96 69.21 89.06 90.08 IRIS 91.6 91.5891.25 95.45 91.92 92.55 93.89 90.34 91.94 92.1 85.64 96 ZOO 90.27 9090.93 96.61 91.91 91.42 66.68 92.86 67.64 74.94 X 94.29 AVG 75.92 78.4277.87 81.84 78.89 74.84 73.48 77.38 75.87 73.07 74.22 79.75

Table 2 shows a comparison of several classification techniques. Thetable presents the accuracies obtained in various problems from the UCImachine learning respiratory. Each accuracy is the average of 10 trials.

DATASET C4.5 C4.5r ITI LMDT CN2 LVQ OC1 NEVP K5 Q* RBF CBD GLASS 7.236.28 7.96 11.25 8.34 10.24 9.1 6.29 7.81 6.98 7.35 2.08 IONOSPHERE 2.822.58 2.71 3.51 3.29 3.36 2.21 3.81 4.14 4.7 6.45 2.56 LUNG 14.2 18.9213.52 32.2 13.79 12.48 17.53 14.83 11.96 18.6 16.27 12.6 CANCER WINE5.84 5.09 6.24 5.22 6.11 4.84 8.45 2.22 6.86 6.64 5.16 1.96 PIMA 2.13.92 2.16 4.3 2.36 4.46 22.4 3.19 3.67 8.19 2.39 3.02 INDIANS BUPA 5.746.05 4.23 6.63 7.99 7.14 8.45 11.97 7.22 4.25 7.92 2.05 TICTAC 2.44 1.052.38 8.79 0.95 2.99 5.88 1.32 2.7 3.16 3.35 9.43 TOE BALANCE 3.35 3.98 32.95 3.38 4.39 2.07 7.12 7.53 19.09 2.38 3.03 IRIS 5.09 5.09 4.81 4.715.95 3.73 4.68 7.45 4.1 5.28 27.37 4.35 ZOO 7.59 7.24 6.11 1.56 5.956.26 30.36 4.62 20.03 23.8 X 2.13 AVG 5.64 6.02 5.312 8.112 5.811 5.98911.11 6.282 7.602 10.07 8.738 4.321

Table 3 shows a comparison of several classification techniques. Thetable presents the standard deviations obtained in a set of 10 trials onvarious problems from the UCI machine learning repository.

Furthermore, one can also implement and try the predictors used in(Golub et al., 1999) and (Alizadeh et al., 2000) which were shown towork well in cancer diagnosis problems similar to applicant's. Theselection of the final predictor was based on the validation resultsobtained in the last step of the data analysis.

Validating the Predictor

In order to validate the predictors, the classical method ofcross-validation was used (Breiman et al., 1984). The idea behindcross-validation is that the predictor is tested, not based on itsabilities to simply memorize the data presented during the training, butbased on its abilities to generalize the knowledge acquired during thetraining to previously unseen cases. For this reason, the predictor mustbe checked on data that belongs to the same distribution but was notused during the training. This can be implemented in several waysdepending on the number of examples available. If only few examples(such as stage I patients, ˜40 total) are available, reducing the sizeof the training set even further by setting patterns aside forgeneralization testing could jeopardize the training. In such cases, thealgorithm is used with only n−1 of the n available patterns and testedon the remaining one. This is done n times, each time leaving out adifferent pattern. An average is calculated over the n experiments. Thisis known as the leave-one-out method. If more patterns are available,the pattern set can be divided into n different subsets of patterns.Then one subset can be left out of the training and used to test thegeneralization. Again, the value reported is an average of the n trialsperformed leaving out each of the n subsets. This method is known asn-fold cross validation. Finally, if the pattern set is very large(patients with stage III or IV cancer), it can simply be divided into atraining set and a validation set. In this case, the generalizationabilities of the technique can be characterized by its performance onthe validation set.

For each predictor the specificity, sensitivity, positive predictivevalue and negative predicted value can be calculated usingcross-validation data (i.e. values that have not been used inconstructing the predictor itself). This ensures that the qualitymeasures obtained in this study reflect the real world performance to beexpected in the field.

Once informative phagotopes are found the gene encoding the phagotopewas identified.

1. Identification Genes Encoding the Phagotyopes.

Phage clones specifically reacting with patient sera, as determined bymicroarray immunoscreening, can be amplified by PCR using T7 capsidforward and reverse primers. PCR fragments were purified and 100 ng offragment was analyzed to determine the nucleotide sequence of the cDNAinsets. Sequence alignments are performed using BLAST software andGenBank databases. The sequence information can be used in several ways.Initially, the DNA sequence information provides a database of thefrequency of reactivity to a particular epitope.

Diagnostic Markers Derived from the Combined Processes IncludingBiopanning, Assay of Patients' Sera with Epitopes on Filters andBiochips, and Identifying the Best Predictor/Marker of Disease.

DNA Sequence Analysis of Phaqotope Clones

PCR amplified DNA sequences from 96 phagotopes that reacted with patient#1 and at least one other OVCA serum are shown in the table below. Someclones were isolated multiple times and one clone was represented 23times out of the 96 clones analyzed. This was the human homologue of theoncogenic gene Bmi-1, (GenBank NM005180.1) that inhibits the expressionof p14ARF and cooperates with c-myc (Lindstrom et al., 2001. The insertsizes for the Bmi-1 phage clones varied in coding capacity depending onthe isolate between 67-94 amino acids in length. Eight other clones wererepresented twice and one was isolated three times. One of these genesisolated twice was the heat shock protein 70, which has been shown to beoverexpressed and antigenic in ovarian cancer tumors and was found tohave been identified in the SEREX database 5 times. The size of the openreading frame in the HSP70 clone is 109 amino acids in length. Anotherclone isolated two times of the 96 sequenced is a known cancer antigencalled RCAS1 which is overexpressed in 58% of ovarian cancer and manyothers as well (Sonoda et al., 1996) RCAS1 is an estrogen regulated genewhich can inhibit the immune system from killing a tumor (Nakashima etal., 1999). This information clearly indicates that this technology iscapable of detecting cancer antigens that can be used for diagnostic andimmunotherapy purposes. If overbiopanning occurred, only a few differentclones would be found. However, as the remaining clones were isolatedonce each, it is therefore convincing that 4-5 biopannings isappropriate. In this first group of 480 clones there were isolatedclones that reacted with approximately 60% of the OVCA patients usingthe macroarray filters and more efficiently using the microarraytechnology. Additional epitope clones provide additional sensitivity forthis assay.

Clone Name GenBank ID Clone found 23 times Bmi-1 (oncogene) NM_005180.1Clones found 2-3 times HSP-70 XM_050984.1 RCAS1 (EBAG9) BC005249.1A-kinase anchoring protein 220 XM_038666.1 G-protein gamma-12 subunitNM_018841.1 Neuronal apoptosis inhibitory protein 6 AF242431.1hypothetical protein DC42 XM_028240.1 WD repeat domain 1 (WDR1)XM_034454.1 zinc finger protein 313 XM_009507.1 54 other clones isolatedonce each.

Serum reactivity toward a cellular protein can occur for two possiblereasons: 1) expression of a mutated form of the protein by the tumorcells and 2) overexpression of the protein in the tumor cells.Identification of proteins detected by the host immune system in thisfashion therefore provides patienthanistic information about protein(s)that can be mutated or overexpressed in ovarian cancer. Such informationprovides insight into the molecular targets and mechanisms giving riseto ovarian cancer. Lastly, the sequences identified using theepitope-biopanning/phage microarray approach can be useful for earlydetection of cancer occurrence and recurrence by screening patients'sera and peritoneal fluids and providing immunogens for immunotherapyvaccines.

Example 2

A strategy was developed for serological detection of large numbers ofantigens indicative of the presence of cancer, thereby using the humoralimmune system as a biosensor. The high-throughput selection strategyinvolved biopanning of an ovarian cancer phage display library usingserum immunoglobulins from an ovarian cancer patient as bait. Proteinmacroarrays containing 480 of these selected antigen clones revealed 44clones that interacted with immunoglobulins in sera from all (32/32)ovarian cancer patients, but not with sera from either healthy women(0/25) or patients having other benign or malignant gynecologicaldiseases (0/14). An informative subset of 26 antigen clones was chosenbased on the criterion that the serum from each of a group of 16patients interacted with at least one of the clones. When another,independent group of 16 serum samples was used, all 16 samplesinteracted with one or more of the 26 clones, and none from 12 healthywomen. The process of globally profiling disease relevant epitopes isknown as “epitomics”.

In searching for a method for the early detection of ovarian cancer(OVCA), large numbers of potential diagnostic antibodies were identifiedand a high-throughput strategy was developed to clone antigenbiomarkers. Because antibodies to any single antigen tend to detect onlya small fraction of cancer patients, the necessity to screen a largepanel of potential antigen markers was recognized. Therefore adifferential biopanning technique was used to screen T7 phage displaycDNA libraries to isolate cDNAs coding for epitopes binding withantibodies present specifically in the sera of patients with early orlate stage ovarian cancer but not with antibodies in the sera of healthywomen. Using a single OVCA patient's immunoglobulins (IgG) as bait,there were identified both established and novel antigen biomarkers.Large numbers of cancer-associated antigens can be found by this phagedisplay technique more rapidly than using standard SEREX analysis. Thisis due to the power of repeated cycles of selective enrichment possiblewith viable phage display cDNA biopanning, especially when screening isperformed with serum containing a complex mixture of low titer of IgGs,compared to the single step screening possible with SEREX, which isbiased toward the identification of antigens that can be detected at arelatively high titer of IgGs.

The antigens that were identified through this process have diagnosticvalue with additional potential for development of therapeutic vaccinesor imaging reagents. Since the host immune system can unravel molecularevents (overexpression or mutation) critical to the genesis of ovariancancer, this novel proteomics technology can identify genes withsignificant mechanistic involvement in the etiology of the disease. Ourinitial goal is to develop a serum-based test that can detect ovarianepithelial cancer at an early and curable stage.

Methods

Serum Samples.

Blood samples from ovarian cancer patients (Stages I-IV) and healthycontrols were obtained from the Barbara Ann Karmanos Cancer Institute.Processing of blood to extract serum was performed in the laboratory.Briefly, blood samples were centrifuged at 2500 rpm at 4° C. for 10-15minutes and supernatant were stored at −70° C. until use.

Construction of T7 Phage Display cDNA Library from Ovarian Cancer CellLine, SKOV3.

Isolation of mRNA from Total RNA.

Ovarian cancer cells were grown in monolayer culture. Total RNA wasprepared using trizol reagent according to manufacturer's instructions(Invitrogen, Carlsbad, Calif., USA). Total RNA, 0.5 mg, was used for thepurification of Poly(A)+ mRNA following the method as suggested by themanufacturer (QIAGEN Inc, Valencia, Calif.). Poly(A)⁺ mRNA wasquantitated by UV spectroscopy and the process of poly A selection wasrepeated once. Twice poly (A) selected mRNA was stored at −70° C. foruse in library preparation.

Construction of T7 Phage Display cDNA Library.

Novagen's OrientExpress cDNA Synthesis and Cloning systems were used inthe construction of the ovarian cancer T7 phage cDNA libraries (Novagen,cDNA manual, TB247). The OrientExpress Random Primer System was used toachieve orientation-specific cloning between EcoRI and HindIII sites.First and second strand cDNA synthesis were sequentially carried out inthe presence of 5-methyl dCTP. After second strand synthesis, the cDNAwas treated with T4 DNA polymerase to blunt the ends. The addition ofEcoRI/HindIII Directional Linker d(GCTTGAATTCAAGC) at the d(A)n:d(T)nend created a HindIII site d(AAGCTT) in which the two underlined baseswere derived from cDNA. The two dT's were provided on the 5′ end of eachfirst strand by the HindIII random primer d(TTNNNNNN). Excess linkersand small cDNAs (<300 bp) were removed by a gel filtration step asdescribed in Novagen's manual TB 247. The digestion of the cDNA withboth HindIII and EcoRI thus yielded cDNA molecules ready for directionalinsertion into EcoRI/HindIII vector T7Select 10-3 arms. After vectorligation and packaging using T7 packaging extracts, the phage wereplated to determine the library titer. About 50 phage clones wererandomly picked up and PCR was performed with the T7 forward primer(TCTTCGCCCAGAAGCAG) and T7 reverse primer (CCTCCTTTCAGCAAAAAACCCC), inorder to determine the insert sizes. The insert size range was found tobe between 300 bp-1.5 kb.

Amplification of Packaged Libraries by Liquid Culture Method.

10 ml of LB/carbenicillin medium was inoculated with a single colony ofE. coli strain BLT5615 from a freshly streaked plate. The mixture wasshaken at 37° C. overnight. Five ml of the overnight culture was addedto 90 ml of LB/carbenicillin medium and was allowed to grow until theOD₆₀₀ reached 0.4-0.5. After obtaining the appropriate OD, 1 mMIsopropyl-β-D-thiogalacto-pyranoside (IPTG), (1×) M-9 Minimal salts and0.4% glucose were added and the cells were allowed to grow for 20minutes. An appropriate volume of culture was infected with phagelibrary at multiplicity of infection (MOI) of 0.001-0.01 (100-1000 cellsfor each pfu). The infected bacterial culture was incubated with shakingat 37° C. for 1-2 hours until lysis was observed. After lysis, 0.02%glycerol and 0.02M phenyl-methyl sulphonyl fluoride (PMSF) and proteaseinhibitor cocktail (PIC) were added to the cell lysate to blockproteolysis of the capsid fusion proteins. The phage lysate wascentrifuged at 8000×g for 10 minutes. The supernatant was collected andstored at 4° C. The lysate was titered by plaque assay under standardconditions. The libraries were stored at −80° C. after purification bypolyethylene-glycol precipitation and ultracentrifugation through acesium chloride step gradient.

Selection of T7 Phage Displayed cDNA Libraries with Human Sera.

Affinity Selection with Sera from Normal Individuals.

Twenty-five μl of Protein G Plus-agarose beads were placed into a 0.6 mlmicrocentrifuge tube and washed twice with 1× phosphate buffered saline(PBS). The washed beads were blocked with 1% bovine serum albumin (BSA)at 4° C. for 1 hour and then incubated at 4° C. for 1 hour with 250 μlof pooled sera from 20 healthy women at a 1:20 dilution. After 3 hoursof incubation, beads were washed three times with 1×PBS and thenincubated with phage library (˜10¹⁰ phage particles) made from anovarian cancer cell line, SKOV3. The mixture was centrifuged at 3000 rpmfor 2 minutes to remove phage nonspecifically bound to the beads and thesupernatant (phage library) was collected for immunoselection.

Immunoselection of the Phage Mixture with Serum from an Ovarian CancerPatient.

Protein G Plus agarose beads were placed into a 0.6 ml microcentrifugetube and washed two times with 1×PBS. The washed beads were blocked with1% BSA at 4° C. for 1 hour and then incubated at 4° C. with 250 μl of a1:20 dilution of serum from the ovarian cancer patient, MEC1. After 3hours, the beads were washed three times with 1×PBS and incubated forimmunoselection overnight at 4° C. with the phage library supernatant.After this incubation, the mixture was centrifuged at 3000 rpm for 2minutes and the supernatant was discarded. The beads were washed threetimes with 1×PBS and the phage was eluted from the washed beads as perthe manufacturers instructions. The bound phage was removed from thebeads by centrifugation at 8000 rpm for 8 minutes. Eluted phage (200 μl)were transferred to liquid culture for amplification (100 μl elution to20 ml culture). Four rounds of affinity selection were carried out onthe amplified phage obtained for each series of biopannings. The numberof biopanning cycles generally determines the extent of the enrichmentfor phage that binds to the sera of patient with ovarian cancer. Fourother serum samples from ovarian cancer patients were also used forimmunoselection of clones. MEC1 gave the strongest binding with itsclones and therefore those clones were selected for the remainder ofthis study.

Macroarray Immunoscreening.

The titer of the T7 phage library obtained after amplification of eachBiopanning (BP1-BP4) eluate was determined by plaque assay. E. coliBLT5615 was infected with the primary unamplified phage from biopanning(BP1-BP4) and plagued to limiting dilution onto LB/carbenicillin plates(150 mm×15 mm petri dish) so that sufficient numbers of single plaquescould be isolated to obtain 12×96 well plates for arraying. The plateswere incubated at 37° C. for 3-4 hours until the plaques were visibleand then picked for amplification in the 96-well plates. Lysis of thehost bacteria generally occurred after 2 hours. After bacterial lysis,the plates were centrifuged at 3000 rpm for 20 minutes. The samples fromthe 96-well plates were arrayed onto a nitrocellulose membrane(Osmonics) using the Beckman Biomek 2000 liquid handling robot. Thisrobot, equipped with a 96-pin printing head spotted the samplescontained in 96 well plates onto nitrocellulose membranes. The patternswere printed in a 4×4 configuration. Position A1 contained 16 spots,each representing a phage sample (FIG. 12A). Triplicates were printedfrom well A1 of each of five different 96 well plates (15 spots) and the16^(th) spot contained a positive control of diluted human serum used inthe 4 corners of the plate only as shown by black arrows (FIG. 12A).After each round of spotting, the pins were washed in 0.1% SDS, sterilewater, and then ethanol. After the spotting was completed,nitrocellulose membranes were blocked with 5% non-fat dry milk for 1hour at room temperature. The membranes were then incubated with apatient's serum (pretreated with 150 μg of bacterial extracts for 2hours at 4° C.) at a dilution of 1:10000 or 1:3000 for 1 hour at roomtemperature. Bacterial extract was used because some patients andcontrols had antibody binding to bacterial protein(s). The membraneswere then washed three times with 0.24% Tris, 0.8% NaCl, 1% Tween-20(TBST) for 15 minutes each and then incubated with secondary antibody,goat-anti human IgG-HRP conjugated (Pierce, Rockford, Ill., USA) at1:5000 dilution for 1 hour at room temperature. The membranes were againwashed three times with TBST for 15 minutes each, developed withSupersignal West Pico chemiluminescent substrate (Pierce, Rockford, II,USA) and the images captured on X-ray film.

Stability of Serum Specimens.

One source of error in the immunodetection on macroarrays could bevariability in serum sample preparations or storage. Therefore, a testwas performed to determine whether some common handling conditionsadversely affect the usefulness of the sera for the assays. For thistest, several aliquots of the same serum sample from one ovarian cancerpatient were subjected to various treatments; repeated freeze-thawcycles (10 times), incubation of the blood sample at 37° C. for 72 hoursbefore processing the serum, extended storage at 4° C., treatment atroom temperature overnight, and heat treatment at 65° C. for 10 minutes.Freshly thawed serum, processed normally, served as a control.Robotically printed nitrocellulose membranes containing the set of 480clones were later processed with each of those treated and untreatedserum samples.

ELISA Macroarray Analysis.

Forty-four Stage I-IV clones, in triplicate, were arrayed onto anitrocellulose membrane (Osmonics) using the Beckman Biomek 2000 liquidhandling robot. Nitrocellulose membranes were blocked with 5% non-fatdry milk for 1 hour at room temperature and then incubated with patientor control serum (pretreated with 150 μg of bacterial extract for 2hours at 4° C.) at dilutions of 1:1000, 1:3000, 1:10000 and 1:30000 for1 hour at room temperature. Immunoreactivity was performed with serumfrom patients or healthy controls. For one set, the immunoreactivity wasalso performed with a monoclonal antibody to the N-terminus of the T7gene 10 protein at dilution 1:10000. This was performed as described inthe macroarray immunoscreening. The intensity of each spot was measuredusing ImaGene software from BioDiscovery Inc, with backgroundsubtraction and calculated using the following equation:

Intensity Ratio=(Mean of Clone)/(Mean of T7 for 12 replicates of thatClone)−(Mean of Blank Phage)/(Mean of T7 for 12 replicates of that BlankPhage).

The Intensity Ratio vs Serum concentration was plotted for each antigenclone.

Sequencing of Phage cDNA Clones.

Individual phage clones were PCR amplified using forward PCR primer 5′GTTCTATCCGCAACGTTATGG 3′ and reverse PCR primer 5′GGAGGAAAGTCGTTTTTTGGGG 3′. PCR products were purified on 1% agarosegels. The bands were excised from gels under UV light and DNA wasextracted/purified using a Qiagen gel extraction kit (Qiagen Inc,Valencia, Calif., USA). Fifty ng of each purified PCR product wasanalyzed using forward Sequencing primer 5′ TGCTAAGGACAACGTTATCG 3′ byWayne State University DNA Sequencing Core Facility.

Results

Differential Biopanning of T7 Phage cDNA Expression Libraries EmployingSera Obtained from Women with Ovarian Cancer and Healthy Controls.

A method of differential biopanning to screen a T7 phage cDNA libraryprepared from an ovarian cancer cell line, SKOV3, was developed using alate stage ovarian cancer patient's serum (MEC1) as the bait to isolatetumor-specific antigens. First the library was pre-adsorbed with serapooled from 20 healthy controls so as to remove the antigen clonesbinding with common antibodies unrelated to cancer. The resulting phagewere then bound to antibodies present in the serum of a cancer patientand the unbound phage removed. This selection procedure was repeatedfour times, amplifying the phage between cycles of biopanning. Groups of96 clones were picked from the patient's biopanning at cycles 1, 2, 3and 4. Amplified phage clones were spotted on nitrocellulose membranes,and useful phage clones were identified by their binding with patientIgG antibodies at a dilution of 1:10000. There was a significantenrichment for phage-bearing epitopes that bound serum IgGs after thefourth round of biopanning. Because about 35% of the selected phageclones interacted with MEC1 serum IgGs after the fourth round ofbiopanning, further biopanning was not performed to avoid reducing thediversity of phage clones.

Serological Detection of Antigens Using Macroarrays

The utility of such phage display antigen clone sets for the serologicaldetection of cancer is best demonstrated by their interaction with serafrom patients other than those used in the selection step. A set of 480clones from the fourth round of biopanning was robotically spotted onnitrocellulose membranes. The binding of the cloned antigens with theIgGs in patients' sera was analyzed at a dilution of 1:10000. The strongpositive interactions observed with the MEC1 serum indicated arelatively high titer of the IgG molecules that bound with the MEC1clones (FIG. 12A). Several dilutions of the MEC1 serum were previouslyused for antigen detection and a dilution of 1:10000 produced thecleanest pattern of strong binding. Although 480 clones were identifiedfrom the biopanning with MEC1 serum as the bait, not all 480 clonesinteracted with the MEC1 serum (FIG. 12A). This can be explained by anon-specific interaction between phage clones and the Protein-G+ beadsbearing the serum antibodies. When serum IgG-binding with sera fromother patients (non-self reaction) was analyzed using replicates ofthese robotically spotted macroarrays, cross-reactivity was observed inmost patients at a dilution of 1:10000 (FIG. 12B-E). Sera from otherpatients required either a 1:3000 or 1:30000 dilution to detect positiveclones. Binding was scored positive only when 3 of the triplicates hadsimilar intensity and when the intensity was significantly higher thanthe background intensity of other spots within the same patch. Sera from71 individuals were tested; 10 were from women with early stage OVCA(Stage I and Stage I borderline), 22 from women with late stage OVCA, 14from women with benign or other gynecological diseases, and 25 fromhealthy controls. Tumor histology and stage of all the patients' usedfor the study are listed in Table 4. Late stage patients OVC015 andMEC23 bound more intensely than the Stage I patients 4679 and 4387 (FIG.12B-E). In the subtractive biopanning scheme, phage epitope clonesbinding IgGs were isolated in control sera even though these controlsera were not used in the initial subtractive biopanning steps. Asexpected, a fraction of the 480 phage clones on the macroarraysinteracted with approximately 10% of the controls. All clones thatinteracted with the control sera were eliminated from furtherconsideration. One hundred and forty-nine clones interacted with serafrom Stage I-IV ovarian cancer patients but with none of the 25 controlsera. Forty-four out of 149 clones interacted specifically with theseStage I-IV sera. The remaining 105 clones interacted with sera fromwomen who had benign tumors, endometrial cancers or other gynecologicaldiseases and may represent biomarkers of gynecological sickness. Theseclones were excluded because these conditions are a common source offalse positive results in CA-125 clinical testing. A matrix summarizingthe binding of the 44 Stage I-IV selected antigen clones to sera frompatients and controls is shown in Table 5A. The derivation of thismatrix was based on an agreement between two observers who analyzed thedata independently, with 87% concordance.

Only 2/44 selected clones, 2G4 and 3B12, bound with MEC1 serum IgGsdespite the fact that T7 cDNA library was biopanned with MEC1 serum asthe bait. A large number of clones interacting with the MEC1 serum wereeliminated because they bound with either healthy control or withpatients having benign or other gynecological diseases. The best markersare those interacting with the most patients; these include such clonesas 2H9 (13/32), 2G2 (13/32), 2B4 (12/32), and 2G4 (12/32) that had thehighest frequency of IgG binding with sera from ovarian cancer patients.Three antigens, 2F7/2B4, 5C3/2G4, 2E1/4A3 were found in multiple clonesresulting in a panel of 41 markers binding with IgGs in Stage I-IVovarian cancer sera (Table 5A).

Although 41 antigens interacted with sera from all 32 patients, thenumber of clones in the set needed to detect all 32 ovarian cancerpatients were reduced. The serum set from 32 patients was randomlydivided into two groups. The first group (Group 1) consisted of 16patients and 25 healthy women; and the second group (Group 2) consistedof the other 16 patients and 12 different healthy women. Group 1 wasused to select the minimum number of clones necessary to detect allpatients. The strategy of clone selection involved ranking of clones inorder of decreasing binding with sera from ovarian cancer patients(Table 7A). Next, a combination of clones was selected for binding withIgGs in sera from all of the ovarian cancer patients in the set.Twenty-six clones of Group 1 detected all of the ovarian cancer patients(16/16) (Table 7A); all but one patient's serum bound with more than oneof the selected clones. These 26 clones were then tested on sera fromGroup 2 (16 patients and 12 healthy controls), for antibody binding(Table 7B). Sera from all of the patients in Group 2, (16/16), boundwith at least one of these clones and none of the sera from the healthywomen (0/12) bound to these clones.

A second group of 21 clones was found to interact with (18/22) lateStage patients' sera but not with sera from early stage patients, withsera from 25 healthy women or with sera from 14 patients with eitherbenign tumors, endometrial cancers or other gynecological diseases(Table 5B). Although 4 late stage patients were not detected by these 21clones (Table 5B), they were detected by 44 Stage I-IV clones (Table5A). Among these 21 clones, antigen 2B3 interacted with the greatestnumber of patients sera (10/22), clone 5A2 with 8/22, clones 2D7 and 2E7with 5/22 sera. Although these clones did not detect women with earlystage ovarian cancer, further analysis may show them to be useful asmarkers of recurrence.

Stability of Serum Specimen.

An important feature of a test for widespread clinical use is thestability of the analyte in the test sample. To identify any inaccuracyin detecting IgG molecules in this multianalyte assay due to serumsample preparation problems or serum storage, a test of the durabilityof the serum samples was carried out. Repeated freeze-thaw cycles (10times), heated to 65° C. for 10 minutes, or left the unprocessed bloodat 37° C. for 72 hours were performed. Only heat treatments of the serumaffected the positive signals on the macroarrays, because heat treatmentis sufficient to denature immunoglobulins (IgG). Therefore, the complexset of IgG molecules in serum samples are very stable and provide areliable analyte for clinical studies of diagnostic arrays of clonedantigens.

ELISA Macroarray Analysis.

The set of 44 (Stage I-IV) phage display cDNA clones listed in Table 5A,were printed robotically on nitrocellulose membrane and an enzyme-linkedimmunosorbent assay (ELISA)-like experiment was performed. For clones4A11, 2H9, 2G4 and 2F7, the binding of antigens decreased withincreasing dilution of serum (FIG. 13A-D). Although clones boundnonspecifically with control sera at high serum concentrations, theirbinding decreased to zero as the sera were diluted, whereas theinteraction of the same clones with IgGs in patients' sera persisted ateven 1:10000 serum dilution. This demonstrated that the interaction ofantigen clones with patients' sera was indicative of a typical,titerable antigen-antibody interaction.

Phage-Coded Antigen Sequence Analysis

To identify the selected gene products, phage DNAs were amplified by PCRand the cDNA products sequenced. The DNA sequences were checked forhomology to the GenBank databases using BLAST. The predicted amino acidsin-frame with the T7 gene 10 capsid protein were determined. Elevensequences were homologous to known gene products while other clones hadno homology to any annotated sequences in the public databases (Table6A). Among the gene products, 11 represented known gene products in thecorrect orientation and in the correct reading frame with the T7 gene 10capsid protein indicating that the serum IgG binding region waslocalized to a portion of the natural open reading frame of the protein.Of the remaining 33 clones, 13 clones contained an open reading framewith the T7 10B gene with a frameshift within the natural reading frameof the gene; 7 clones contained portions of either 5′ or 3′ untranslatedregions of known genes; 13 clones contained segments of genomicsequences. This in turn resulted in the formation of recombinant fusionproteins in which the predicted amino acid of the in-frame fusion withthe T7 10B protein was not similar to the original protein coded by thegene. The size of the additional peptide sequences ranged from 5-48amino acids. This result indicated that the recombinant gene products ofthese clones must be coding for proteins that mimic some other naturalantigens, and hence can be termed mimotopes (Table 6A). BLASTp search ofthe SWISSPROT database for homology to each in-frame mimotope confirmedthis observation. For example, clone 2H5 contained a nucleotide sequencehomologous to the ATP synthase, H+ transporter. Using BLASTp, there wasobserved a sequence homology of (8/10) amino acids with the leukocytecommon antigen precursor. Each mimotope had significant homology to anatural open reading frame (Table 6A).

Discussion

The early detection of cancer is a significant challenge in clinicaloncology. Once accurate methods become available, early detection canresult in a significant reduction in morbidity and mortality of thesediseases. The detection of ovarian cancer at Stage I could result in acure rate of 90%. To this end there has been devised an approach ofhigh-throughput selection of antigen biomarkers using phage displaylibraries and marker selection using a highly parallel analysis onmacroarrays. The process began with a representative sample of 480cloned markers from biopanning an ovarian cancer T7 phage display cDNAlibrary with one patient's serum. There was first demonstrated thatthese clones bound to IgG molecules found in the sera of patients otherthan the one used for antigen selection. One hundred and forty ninemarkers that bound to IgGs in sera from OVCA patients showed nointeraction with sera from cancer-free women. Forty-one of these antigenbiomarkers had positive interactions with early (including cancers withborderline histology) and late stage ovarian cancer patients and therewere no false positive interactions with IgGs in sera from either womenhaving benign gynecological syndromes such as ovarian cysts andendometrial fibroids or sera from women with endometrial cancer. BecauseStage I and Stage I borderline tumors can elicit a detectable immuneresponse in this assay, this technology is sensitive to very small tumorburdens as (Table 5A). Sera from women with other cancers can be used todistinguish markers that are specific to ovarian cancer from those thatbind to antibodies in sera from individuals with other cancers. Based onthis representative sample of 480 clones from a single selectionexperiment, discovery of these markers to larger numbers of epitopeclones were scaled up, cloning from additional libraries using sera fromthese and other women with ovarian cancer. Although the epitope markerswere cloned using serum from a patient having the most common histologictype of ovarian cancer, serous adenocarcinoma, there has been shown thatthese markers are capable of detecting other histologic types of ovariancancer, including endometrioid and clear cell tumors as well (Table 5A,Table 4). When the top ranking 26 (Table 7A) were applied, to thedataset comprised of 16 patients and 12 healthy women, these clonesbound to IgGs in the sera from 16 out of 16 patients (Table 7B). As noneof these 26 clones showed binding to IgGs in sera from 25 healthy womenin Group 1 or 12 healthy women in Group 2, it is likely they represent apromising discriminator between the healthy and cancer sera. Largerstudies with additional antigen biomarkers in other populations can beused to verify that the rate of diagnostic misclassification with thisapproach is small enough to justify its use in a clinical setting asscreening test for ovarian cancer.

Knowledge regarding the immunogenicity and expression pattern ofserologically-defined tumor antigens is critical in assessing thetherapeutic and diagnostic potential of those antigens. The presentstudy demonstrates that the use of T7 phage display selected clones isan effective technique for molecular profiling of the humoral immuneresponse in ovarian cancer. Within this initial panel of 41 biomarkers,8/9 contained large portions of open reading frames of the parentalproteins; 1F6 is the receptor-binding cancer antigen expressed on SiSocells (Human uterine adenocarcinoma cell line) (RCAS1); 3A9 is thesignal recognition protein (SRP-19); 5C11 is the AHNAK-related sequence;2B4, nuclear autoantogenic sperm protein (NASP); 3C11 is the Ribosomalprotein L4 (RPL4); 4H3 is the Nijmegen breakage syndrome 1 (nibrin)(NBS1); 2G4 is the eukaryotic initiation factor 5A (eIF-5A); and 5F8 isthe Homo sapiens KIAA0419 gene product. With the exception of clone 4A11that is the Homo sapiens chromodomain helicase DNA binding protein 1,CHD1, all of the aforementioned gene products have a known or suspectedetiological association with cancer. One of these markers, RCAS1, isoverexpressed in many cancers such as uterine, breast and pancreaticcancer. As indicated by the broad overexpression of RCAS1 in humancancers, some of the antigens identified may not be specific to ovariancancer. However, this does demonstrate that the epitomics profiling ofthe humoral immune response in cancer patients can identify serumantibody markers that are relevant to the etiology of their cancer (e.g.overexpressed or mutated) with diagnostic and therapeutic value.Interestingly, these 9 antigens with parental open reading frames arepredicted to be intracellular products. This finding is in agreementwith reports using the SEREX procedure, whereby the majority of thoseantigens are also intracellular, and their probable release by necrosisor cell lysis at the tumor site is an initiating factor in eliciting animmune response.

The remaining 32 clones are mimotopes, defined as peptides capable ofbinding to the paratope of an antibody, but are unrelated in sequence tothe natural protein that the antibody actually recognizes. Such peptidesare usually identified by testing combinatorial peptide librariesobtained by chemical synthesis or phage display for their ability tobind monoclonal antibodies specific for discontinuous epitopes. This isanalogous to the previous studies that have selected randomized peptidelibraries on serum from Hepatitis B patients. Peptide mimotopes canpotentially be used as a novel form of immunotherapy to induce abeneficial antitumor response. A mimotope derived from a phage displaylibrary can induce specific inhibition of the binding betweentumor-inhibitory antibody and the Erb-2 receptor. Such mimotopes mayrepresent a superior form of immunotherapy that may not elicit sideeffects due to autoimmunity to a natural protein.

In conclusion, using a combination of high throughput selection andarray-based serological profiling that are called Epitomics®, there wasisolated a panel of 41 antigens, including 8 antigens previouslyassociated with cancer. Further work with larger panels of antigensanalyzed on macroarrays or microarrays provide a comprehensive set ofmarkers that can be evaluated using sera from other cancers for thespecificity of an ovarian cancer test. This epitomics approach toantigenic profiling has applications to cancer, autoimmune diseases, andinfectious diseases for diagnostic, therapeutic, and epidemiologicstudies.

Example 3

The 480 clones described in Example 2 were screened against newindependent samples of ovarian cancer patient and control sera, usingthe methods of Example 2. This procedure revealed 166 new clones ofinterest that discriminated cancer from non-cancer with 93% accuracy.Upon DNA sequencing it was found that there were 77 additional newantigens cloned. These antigens, listed in Table 6A, are epitopesincluding SEQ ID NOs: 90, 106, 135, 136, 145, and 150, and mimotopesincluding SEQ ID NOs: 76-89, 91-105, 107-144, 146-149, 151, and 152.

Example 4 Biopanning to Isolate Additional Antigens Using 4 LibrariesUsing 8 Different OVCA Sera

Three additional T7 Phage Display OVCA cDNA libraries were preparedaccording to methods described in Examples 1 and 2. These threelibraries, plus the library of Example 2, were biopanned against eightdifferent patient sera. The properties of the sera are as follows:

Number of Clones Chosen for set of 2800 Patients′ Sera Stage HistologyAntigens OVC063 III Malignant Serous 384 OVC065 IC Malignant Serous 384OVC087 1A Malignant 384 Endometroid Clear Cell OVC0156 1A MalignantSerous 384 OVC023 IIIC Malignant Serous  96 Mec1 III Malignant Serous384 +(480 from Example 2) OVC0155 I Malignant Mucinous 384 OVC0111 IVMalignant Mucinous 384

Positive clones from biopanning cycle 4 were selected on the basis ofhaving strong reactions with sera from 30 patients and no reaction withmore than 30 healthy controls. The best candidate markers were chosen onthe basis of exhibiting a strong IgG binding signal in the self-bindingchip and at least two other patients' sera.

Clone Subselection from 2800 to 1010 Antigens Using 30 Patients and 30Controls:

The number of clones was reduced such that they could be spotted on asingle microarray for the large validation sets. The methods usedinclude:

1) Bootstrapping method combined with an ROC analysis.2) A parametric test (moderated T-test)3) Non-parametric test (U-test: analysis on ranks; less sensitive tooutliers)The union of the top 600 clones from each of the 3 methods above yielded776 clones indicating that among the 2800 antigens many were found to begood markers by all methods. From these 776 markers 432 were highlyranked consistently by all 3 methods. A number of negative controls werealso chosen.

Validation Serum Sets:

A set of protein microarrays was used to validate the above selectedmarkers, and also included were the 63 antigens from Example 2 and 81antigens from example 3. In a set of 1000 microarray experiments, 337clones were obtained that were significantly different between healthyand OVCA by t-test at the level p<0.01 after correction for multipleexperiments. Using this large series, an accuracy of 90% was obtainedusing neural networks using 66% of the sera samples in training set and34% in the test set. From this process, 34 new antigens clones wereidentified as markers. These antigens, listed in Table 6A, are epitopesincluding SEQ ID NOs: 159, 170, and 182, and mimotopes including SEQ IDNOs: 153-158, 160-169, 171-181, and 183-186.

Example 5 Discovery of Candidate Autoantigen Biomarkers from ProteinsCommonly Overexpressed in Ovarian Cancer Via Literature Mining

We have found that at least some of the novel OVCA-induced autoantigensare overexpressed, as determined by immunohistochemistry, in tumorversus normal and benign ovarian tissues (Ali-Fehmi et al, 2010).Therefore, a rational approach to augmenting the panel of biomarkers forthe detection and staging of ovarian cancer is to identify potentialadditional biomarkers through a literature search for proteinsoverexpressed in ovarian cancer tissue, as determined byimmunohistochemistry.

A search was conducted as follows. The search was initiated with a listof potential genes involved in any cancer. A list was generated byplasma proteome (http://www.plasmaproteome.org/ppihome.htm). This listcontained a total of 1261 genes. We searched literature using searchcriteria “gene name and immunohistochemistry and OVCA”. Serousadenocarcinoma histotype was preferentially targeted for our list,though in the majority of the articles, immunohistochemical data werenot stratified based on the histotypes of OVCA. Initially, potentialmarkers were selected solely based on the information presented in theabstract.

After generating the first list, relevant articles were read and asecond cut was made based on expression level and expression inbenign/normal tissue. Measures were taken to avoid proteins that wereexpressed in either benign or normal tissue. Exceptions were made forproteins expressed in normal tissue but showed significantly higherexpression in cancer tissue. Also, attempts were made to avoid proteinsexpressed in borderline tumors. Secreted proteins were avoided becauseproteins shed from cancer cells into circulation can serve as blockingagents against autoantibodies. Cytokines were eliminated since cytokinelevels can also be elevated due to inflammatory conditions and thereforemask our purpose of early detection. An added criterion was thecommercial availability of the overexpressed proteins. Commercialavailability facilitates testing of potential markers in high throughputantigen microarrays and other immunoassay technologies.

A total of 2522 abstracts and approximately 2000 articles were analyzed.The SEREX database was searched for evidence that the potential antigenselicit autoantibody reactions. The information obtained from theliterature was then archived in a database. Two additional markers forOVCA from the list of Pathwork (Monzon et al., 2009).

The result was a table of 30 markers for OVCA that can be tested aspotential autoantigens (Table 8), using samples of protein or peptide inthe same manner as display phage in Examples 1-4 above.

Throughout this application, various publications, including UnitedStates patents, are referenced by author and year and patents by number.Full citations for the publications are listed below. The disclosures ofthese publications and patents in their entireties are herebyincorporated by reference into this application in order to more fullydescribe the state of the art to which this invention pertains.

The invention has been described in an illustrative manner, and it is tobe understood that the terminology that has been used is intended to bein the nature of words of description rather than of limitation.

Obviously, many modifications and variations of the present inventionare possible in light of the above teachings. It is, therefore, to beunderstood that within the scope of the appended claims, the inventioncan be practiced otherwise than as specifically described.

TABLE 4 Tumor Histology and Stage of Patients′ sera used for screeningof ovarian cancer Blood Specimen ID # Histology Stage MEC1* serousadenocarcinoma Unknown MEC2 serous adenocarcinoma IIA MEC16 serousadenocarcinoma IV MEC20 serous adenocarcinoma Unknown MEC23 serousadenocarcinoma IIIC MEC35 serous adenocarcinoma IIIC MEC37 serousadenocarcinoma IIIC TB01-060 serous adenocarcinoma IIIC TB01-108 serousadenocarcinoma IIIC 42501 adenocarcinoma NOS late 400162 adenocarcinomaNOS late 40036 adenocarcinoma NOS late 42780 adenocarcinoma NOS lateB755 adenocarcinoma NOS late 40015 adenocarcinoma NOS late OVC075 serousadenocarcinoma IIC OVC015 serous adenocarcinoma IIIC OVC035 serousadenocarcinoma IIIC OVC007 mixed epithelial IIIC OVC005 Malignant MizedMesodermal Tumor IIIC OVC063 serous adenocarcinoma III OVC045 serousadenocarcinoma IIIC NW0629 (4387) endometrioid adenocarcinoma IC NW0453(4679) adenocarcinoma NOS IC NW0046 (4555) borderline serouscystadenofibroma IA NW1181 (4283) endometrioid adenocarcinoma IA OVC019mixed epithelial IC OVC087 clear cell IA OVC078 endometriod IC OVC070borderline serous IC OVC049 mixed epithelial IA OVC079 borderline serousI 33-38 benign ovarian cyst N/A 92-96 uterine myoma N/A 80-82endometrial adenocarcinoma IIIA 79-62 endometrial adenocarcinoma IIIA35-27 benign ovarian cyst N/A  30-141 benign ovarian cyst N/A  70-153endometrial adenocarcinoma IB 81-80 endometrial adenocarcinoma IA 31-55benign ovarian cyst N/A 39-55 benign ovarian cyst N/A 36-11 endometrialpolyp N/A 32-43 Benign, thickening of endometrium N/A OVC068-1 Bpapillary serous adenoma N/A (benign) & endometriosis OVC054 benignserous cystadenoma N/A *Serum used for biopanning

TABLE 5A Binding of 44 Clones with Late Stage and Stage I Ovarian CancerPatient Sera

The binding of a panel pf 44 clones with 22 Late Stage, 10 Stage Iovarian cancer patients was determined. These 44 antigens listed belowbound exclusively with serum IgGs derived from both late stage and stageI ovarian cancer patients (including borderline histology) but not withserum IgG from normal control or patients with other gyenecologicaldiseases. The grey colored boxes represent positive binding of phageclones with patient's sera. TP: Total number of patients whose serumIgGs bound to each phage clone. •: Serum Dilution 1:3000; •: SerumDilution 1:30000; for others 1:10000 serum dilution was used.

TABLE 5B Binding of 21 Clones with Late Stage Ovarian Cancer PatientSera

The binding of a panel of 21 clones with 22 Late Stage was determined onmacroarrays. These 21 antigens listed below bound exclusively with serumIgGs derived from late stage ovarian cancer patients but not with serumIgG from normal control of patients with other gyenecological diseases.•: Serum Dilution 1:3000; •: Serum Dilution 1:30000; all others wereanalyzed at a serum dilution of 1:10000; TP: Total number of patientswhose serum IgGs bound to each phage clone.The mimotope sequences and the epitopes that are the real antigens thatthe antibodies were produced against based on the amino acid sequencehomology similarity (see below Region of similarity of AA).

TABLE 6 ADescription of Stage I-IV clones. Size range of the Mimotopes ≧5 amino acidsRe- gion Peptide of Description of the sequences of simi- Stagegenes that are in- Epitopes, in- Size of lar- (I-IV) frame with T7frame with T7 the Unigene ity Antigen expression in clones 10B gene10 B gene peptide # of AA any type of cancer 1F6gi|18490914|gb|BC022506.1| AAWQAEEVLRQQK 49 AA Hs.9222 165-Overexpressed in ovarian, Homo sapiens, estrogen receptor LADREKRAAEQQR213 nonsmall cell lung binding site associated,antigen, KKMEKEAQRLMKKcarcinoma, pancreatic 9RCAS EQNKIGVKLS ductal cacinoma (SEQ ID NO: 11)2B4 gi|22042983|ref|XM_032391.3|, EKGGQEKQGEVIV 212 AA Hs.446206 258-Expression levels are Homo sapiens similar to nuclear SIEEKPKEVSEEQ 469higher in myelogenous autoantigenic sperm PVVTLEKQGTAVEleukemia and lympho- protein (histone- VEAESLDPTVKPVblastic leukemia cells. binding)(NASP) DVGGDEPEEKVVT SENEAGKAVLEQLVGQEVPPAEESPE VTTEAAEASAVEA GSEVSEKPGQEAP VLPKDGAVNGPSV VGDQTPIEPQTSIERLTETKDGSGLE EKVRAKLVPSQEE TKLSVEESEAAGD GVDTKVAQGATEK SPEDKVQIAANEETQER  (SEQ ID NO: 12) 2F7 gi|22042983|ref|XM_032391.3|, EKGGQEKQGEVIV15 AA Hs.446206 256- Expression levels are Homo sapiens similar SI 270higher in myelogenous to Nuclear autoantigenic (SEQ ID NO: 13)leukemia and lympho- sperm protein (NASP) blastic leukemia cells. 2G4gi|20987351|gb|BC030160.1| MADDLDFETGDAG 148 AA Hs.310621 1-elF-5A2 sharing 82% Homo sapiens, ASATFPMQCSALR 148identity of amino acid eukaryotic translation KNGFVVLKGRPCKsequence with elF-5A, initiation factor 5A IVEMSTSKTGKHG is a candidateHAKVHLVGIDIFT oncogene related to GKKYEDICPSTHN development of ovarianMDVPNIKRNDFQL cancer. IGIQDGYLSLLQD SGEVREDLRLPEG DLGKEIEHKFDCGEQILITVLSAMTE EAAVA  (SEQ ID NO: 14) 3A9 gi|4507212|ref|NM_003135.1|,QKTGGADQSLQQG 25 AA Hs.2943 119- Transcript generated by Homo sapiensEGSKKGKGKKKK 143 alternative splicing signal recognition particle(SEQ ID NO: 15) between exon 14 of the 19kDa (SRP19)Adenomatous polyposis coli gene and SRP19 is observed and itsexpression is higher in Colorectal cancer 3C11gi|16579884|ref|NM_000968.2| ALQAKSDEKAAVA 68 AA Hs.186350 360-over-expression of L7a Homo sapiens GKKPVVGKKGKKA 427 and L37 mRNA isribosomal protein L4 AVGVKKQKKPLVG confirmed in (RPL4) KKAAATKKPSPEKprostate-cancer tissue KPAENKPTTEDNK samples. PAA (SEQ ID NO: 16) 4A11gi|4557446|ref|NM_001270.1|, QQQQQQQHQASSN 86 AA Hs.311553 107-Not associated with Homo sapiens SGSEEDSSSSEDS 192 cancerchromodomain helicase DNA DDSSSEVKRKKHK binding protein 1 (CHD1)DEDWQMSGSGSPS QSGSDSESEEERE KSSCDETESDYEP KNKVKSRK (SEQ ID NO: 17) 4H3gi|20543465|ref|XM_045343.5|, PTKLPSINKSKDR 92 AA Hs.25812 433-Three different mutations Homo sapiens ASQQQQTNSIRNY 524in NBS1 gene, generating Nijmegen breakage FQPSTKKRERDEEtruncated or aberrant syndrome 1 (nibrin) (NBS1) NQEMSSCKSARIENBS1 transcripts were TSCSLLEQTQPAT observed in different PSLWKNKEQHLSEcancer cell lines. NEPVDTNSDPNLF T (SEQ ID NO: 18) 5C3gi|20987351|gb|BC030160.1|, MADDLDFETGDAG 118 AA Hs.310621 1-elF-5A2 sharing 82% Homo sapiens, ASATFPMQCSALR 118identity of amino acid eukaryotic translation KNGFVVLKGRPCKsequence with elF-5A, initiation factor 5A IVEMSTSKTGKHGis a candidate oncogene HAKVHLVGIDIFT related to developmentGKKYEDICPSTHN of ovarian cancer. MDVPNIKRNDFQL IGIQDGYLSLLQDSGEVREDLPLPEG D (SEQ ID NO: 19) 5C11 gi|535176|emb|X74818.1|HSAHNAKRS,PKFKMPDVHFKSPQ 121 AA Hs.378738 393- Expression level of AHNAKH. sapiens mRNA of AHNAK- ISMSDIDLNLKGPK 512 is higher in melanoma,related sequence IKGDMDISVPKLEG promyelocytic leukemia DLKGPKVDVKGPKVHL-60, osteosarcoma. GIDTPDIDIHGPEG KLKGPKFKMPDLHL KAPKISMPEVDLNLKGPKVKGDMDISLP KVEGDLKGP (SEQ ID NO:20) 5F8 gi|7662105|ref|NM_014711.1|,GVCSSKVYVGKNTS 150 AA Hs.279912 434- mRNA expression level ofHomo sapiens KIAA0419 gene EVKEDVVLGKSNQV 583 another antigen productCQSSGNHLENKVTH KIAA1416 is up-regulated GLVTVEGQLTSDER in colon cancer.GAHIMNSTCAAMPK LHEPYASSQCIASP NFGTVSGLKPASML EKNCSLQTELNKSYDVKNPSPLLMQNQN XRQQMDTPMVSCGN EQFLDNSFEK (SEQ ID NO:21) 1E12TNM_006597.3, Homo sapiens heat LESYAFNMKATVED 105 AAshock 70 kDa protein 8 (HSPA8), EKLQGKINDEDKQKtranscript variant 1, mRNA ILDKCNEIINWLDK NQTAEKEEFEHQQK ELEKVCNPIITKLYQSAGGMPGGMPGGF PGGGAPPSGGASSG PTIEEVD (SEQ ID NO: 90) 2A7NM_003472.3, Homo sapiens DEK EKKNKEESSDDEDK 99 AAoncogene (DNA binding) (DEK), ESEEEPPKKTAKRE mRNA KPKQKATSKSKKSVKSANVKKADSSTTK KNQNSSKKESESED SSDDEPLIKKLKKP PTDEELKETIKKLL A(SEQ ID NO: 106) 3H3 NM_002967.2, Homo sapiens DLRAELRKRNVDSS 194 AAscaffold attachment factor B GNKSVLMERLKKAI (SAFB), mRNA EDEGGNPDEIEITSEGNKKTSKRSSKGR KPEEEGVEDNGLEE NSGDGQEDVETSLE NLQDIDIMDISVLDEAEIDNGSVADCVE DDDADNLQESLSDS RELVEGEMKELPEQ LQEHAIEDKETINNLDTSSSDFTILQEI EEPSLEPENEKILD ILGESLRPHSSN (SEQ ID NO: 135) 4A8NM_003609.2, Homo sapiens HIRA GIISSDGESN 10 AAinteracting protein 3 (HIRIP3), (SEQ ID NO: 136) mRNA 4F2_1NM_000122.1, Homo sapiens exci- LQDPVIRECRLRNS 75 AAsion repair cross-complementing EGEATELITETFTS rodent repair deficiency,KSAISKTAESSGGP complementation group 3 STSRVTDPQGKSDI(xeroderma pigmentosum group B PMDLFDFYEQMDKLcomplementing) (ERCC3), mRNA. AAALE  Protein ID: NP_000113.1(SEQ ID NO:145) 5D4 sp|Q96JP5.1|ZFP91_HUMAN, Zinc CGFTCRQKASLNWH 74 AAfinger protein 91 homolog; MKKHDADSFYQFSC Short = Zfp-91 Length = 570NICGKKFEKKDSVV AHKAKSHPEVLIAE ALAANAGAQACGRT RVTS (SEQ ID NO: 150) 65A6NM_030920.2, Homo sapiens acidic EEVGLSYLMKEEIQ 55 AA(leucine-rich) nuclear DEEDDDDYVEEGEE phosphoprotein 32 family,EEEEEEGGLRGEKR member E (ANP32E), mRNA KRDAEDDGEEEDD (SEQ ID NO: 159)2H3 NM_006136.2, Homo sapiens DWNKILSYKIGKE 17 AA capping protein MQNA(actin filament) muscle (SEQ ID NO: 170) Z-line, alpha 2 (CAPZA2), mRNA2C10 NM_001042483.1, Homo sapiens ERKKRGARR 9 AAnuclear protein 1 (NUPR1), (SEQ ID NO: transcript variant 1, mRNA 182)The above sub-table shows antigens and not mimotopes, the sub-tablebelow shows the mimotopes.

Peptide sequences Description of of Mimotopes, the genes that in-frameDescription of the Stage are in Mimotope with T7 Size of sequences thatAntigen expression (I-IV) clones clones 10 B gene the peptideMimotopes mimic Unigene # Region of similarity of AAin any type of cancer 2H9 gi|21619682|gb| ELLRT 5 gi|20139301|sp|Q9Y446|Hs. 407-411 Immunohistochemical BC032762.1|, (SEQ ID NO: 22) AAPKP3_HUMAN, 148074 Score = 18.9 bits(37), localization of Homo sapiensPlakophilin 3 Expect = 827 plakophilins optineurin, Identities = 5/5(PKP1, PKP2, mRNA (100%), PKP3, and p0071) Positives = 5/5(100%)in primary Query^(b):   1 ELLRT 5 oropharyngeal tumors             ELLRTSbjct^(c): 407 ELLRT 411 3B12 gi|21735624|ref| GQTSM 5gi|729143|sp|P38936| Hs. 144-147 mda-6 (p21) may NM_145690.1|,(SEQ ID NO: 23) AA CDN1_HUMAN, Cyclin- 370771 Score = 16.8 bits(32),function as a Homo sapiens dependent kinase Expect = 3595 negativetyrosine 3- inhibitor 1 (p21) (CDK- Identities = 4/4 regulatormonooxygenase/ interacting protein1) (100%), of melanoma tryptophan 5-(Melanoma Positives = 4/4(100%) growth, monooxygenase differentiationQuery:   2 QTSM 5 progression activation associated protein 6)           QTSM and metastasis protein, zeta (MDA-6) Sbjct: 144 QTSM 147polypeptide (YWHAZ), transcript variant 2, mRNA. 5D8 gi|22024583|gb|KKGPI 5 gi|20177863|sp|Q9BXJ2| Hs. 102-106 TNF-alpha AC087376.5|,(SEQ ID NO: 24) AA CQT7_HUMAN, 153714 Score = 18.5 bits(36), regulatesHomo sapiens Complement-c1q and Expect = 1109 expression ofchromosome 11, tumor necrosis factor- Identities = 5/5 downstreamclone RP11- related protein 7 (100%), components of 230O19, precursorPositives = 5/5(100%) complement complete Query:   1 KKGPI 5          system and sequence            KKGPI plays a role Sbjct: 102 KKGPI 106in energy homeostatis where it is implicated in cachexia, obesityand insulin resistance. 4A4 gi|17028354|gb| AKVIMR 6gi|5921908|sp|O43174| Hs. 138-142 all-trans- BC017483.1| (SEQ ID NO: 25)AA CP26_HUMAN|, 150595 Score = 20.6 bits(41), Retinoic BC017483,Cytochrome P450 26 Expect = 255 acid-induced Homo sapiens,(Retinoic acid- Identities = 5/5 expression and clone metabolizing(100%), regulation of IMAGE: 3506553, cytochrome) (P450RAI) Positives =5/5(100%) retinoic acid mRNA. (hP450RAI) (Retinoic Query:   2 KVIMR 64-hydroxylase acid 4-hydroxylase)            KVIMR (CYP26) inSbjct: 138 KVIMR 142 human promyelocytic leukemia 5A3 gi|15011541|gb|YACLKD 6 gi|1170473|sp|P42575| Hs. 351-355 CASP-3, CASP-4, AF397158.1|(SEQ ID NO: 26) AA ICE2_HUMAN, 433103 Score = 20.2 bits(40), CASP-2AF397158, Caspase-2 precursor Expect = 342 heterogeneously Homo sapiens(CASP-2) (ICH-1 Identities = 5/5 coexpress in clone 11 protease) (100%),leukemic cell pur alpha- Positives = 5/5(100%) lines associatedQuery:   1 YACLK 5 ribosomal            YACLK RNA gene,Sbjct: 351 YACLK 355 partial sequence 2A3 gi|23271193|gb| QILFMDP 7gi|729597|sp|P39086| Hs. 242-246 lonotropic and BC036014.1|,(SEQ ID NO: 27) AA GLK1_HUMAN, 222405 Score = 21.4 bits(43),metabotropic Homo sapiens Glutamate receptor, Expect = 142 glutamatepoly(A) ionotropic kainate 1 Identities = 5/5 receptor polymeraseprecursor (100%), protein alpha, mRNA Positives = 5/5(100%) expressionQuery:   1 QILFM 5 in glioneuronal            QILFM tumours fromSbjct: 242 QILFM 246 patients with intractable epilepsy 4C10gi|24756892|gb| LNTVNTLI 8 gi|13633936|sp|Q9NPR2| Hs. 440-445 NotAC008507.10|, (SEQ ID NO: 28) AA SM4B_HUMAN , 416077 Score =21.8 bits(44), associated with Homo sapiens Semaphorin 4B Expect = 106cancer chromosome 19 Identities = 6/6 clone (100%), CTC-448F2,Positives = 6/6(100%) complete Query:   1 NTVNTL 6 sequence           NTVNTL Sbjct: 440 NTVNTL 445 4D9 gi|21629397|gb| GNSILLIA 8gi|2842764|sp|Q99735| Hs. 3-10 GST-pi has AC099571.2|, (SEQ ID NO: 29)AA GST2_HUMAN, 81874 Score = 21.4 bits(43), significance in Homo sapiensMicrosomal glutathione Expect = 140 the diagnosis chromosome 1S-transferase 2 Identities = 7/8(87%), of cancersa clone RP11-(Microsomal GST-2) Positives = 7/8(87%) s it is 438H8, completeQuery: 1 GNSILLIA 8 expressed sequence          GNSILL A abundantly inSbjct: 3 GNSILLAA 10 tumor cells. 2E11 gi|22004067|dbj| WDLKSEYS 8gi|1710146|sp|P49798| Hs. 80-85 RGS4 is highly AP005356.2|,(SEQ ID NO: 30) AA RGS4_HUMAN, 386726 Score = 21.8 bits(44), expressedHomo sapiens Regulator of G-protein Expect = 106 in brain genomic DNA,signaling 4 (RGS4) Identities = 6/6 regions chromosome (RGP4) (100%),implicated in 8q23, clone: Positives = 6/6(100%) pathophysiologyKB1198A4, Query:  3 LKSEYS 8 of scizophrenia complete           LKSEYSsequence. Sbjct: 80 LKSEYS 85 5G9 gi|20072204|gb| PGCSTTLS 8gi|14423962|sp|O94966| Hs. 940-947 Ubiquitin BC026241.1|,(SEQ ID NO: 31) AA UBPJ_HUMAN, 255596 Score = 18.9 bits(37), carboxyl-Homo sapiens Ubiquitin carboxyl- Expect = 827 terminal- ubiquitin-terminal hydrolase 19 Identities = 6/8(75%), hydrolase proteinPositives = 7/8(87%) L1 genes cause isopeptide Query:   1 PGCSTTLS 8autosomal ligase (E3),            PGC+T LS dominant mRNASbjct: 940 PGCTTLLS 947 familial Parkinson disease. 4H4 gi|20072204|gb|PRCSTTLS 8 gi|6225843|sp|O60760| Hs. 156-160 Lipocalin-type BC026241.1|,(SEQ ID NO: 32) AA PGD2_HUMAN, 128433 Score = 18.9 bits(37),prostaglandin Homo sapiens Glutathione-requiring Expect = 827 D synthaseubiquitin- prostaglandin D Identities = 5/5 (L-PGDS) protein synthase(100%), has recently isopeptide Positives = 5/5(100%) been shown toligase (E3), Query:   3 CSTTL 7 be expressed in Mrna.            CSTTLhuman brain Sbjct: 156 CSTTL 160 tumors, breast tumors and inovarian cancer. 2C7 gi|3152628|gb| GDRSQLWRK 9 gi|24211441|sp|Q13443|Hs. 720-725 Expression of AC004744.1| (SEQ ID NO: 33) AA AD09_HUMAN,2442 Score = 20.2 bits(40), ADAM-9 mRNA AC004744, Homo ADAM 9 precursorExpect = 342 and protein in sapiens BAC (A disintegrin and Identities =5/6(83%), human breast clone GS1- metalloproteinase Positives = 5/6(83%)cancer 465N13 from 7, domain 9) Query:   3 RSQLWR 8 complete           R QLWR sequence Sbjct: 720 RDQLWR 725 4A3, gi|16160856|ref|KKQSSWYQI 9 gi|12498310|sp|Q12882| Hs. 497-502 Higher DPD 2E1XM_007763.5|, (SEQ ID NO: 35) AA DPYD_HUMAN 1602 Score = 21.8 bits(44),activity Homo sapiens Dihydropyrimidine Expect = 106 in gastricmyosin VA dehydrogenase Identities = 5/6(83%), cancer is (heavy [NADP+]precursor Positives = 6/6(100%) observed than polypeptide 12, (DPD)Query:   2 KQSSWY 7 in colorectal myoxin) (DHPDHase)(Dihydrour           KQ+SWY cancer (MYO5A), acil dehydrogenase)Sbjct: 497 KQASWY 502 mRNA (Dihydrothymine dehydrogenase) 4G8gi|15778776|gb| PEGGTDASR 9 gi|13634077|sp|Q9Y493| Hs. 1912-1919zonadhesin AC012363.6|, (SEQ ID NO: 36) AA ZAN_HUMAN, 307004 Score =18.9 bits(37), functions Homo sapiens Zonadhesin Expect = 827 duringBAC clone Identities = 6/8(75%), fertilization RP11-438O12 Positives =7/8(87%) to anchor the from 2, complete Query:    2 EGGTDASR 9 acrosomalsequence             EGGT+A R shroud Sbjct: 1912 EGGTEAFR 1919to the zona pellucida 2E10 gi|20521965|dbj| ASFTLKLQS 9gi|6226869|sp|P34932| Hs. 647-653 Expression of AB051476.2|,(SEQ ID NO: 37) AA HS74_HUMAN, HEAT 90093 Score = 21.8 bits(44),HSP70 is Homo sapiens SHOCK 70 KDA Expect = 106 observed mRNA forPROTEIN 4 (HEAT Identities = 6/7(85%), in human KIAA1689SHOCK 70-RELATED Positives = 7/7(100%) hepatocellular protein, partialPROTEIN APG-2) Query:   2 SFTLKLQ 8 carcinoma cds            SFTLKL+Sbjct: 647 SFTLKLE 653 2D1 gi|4504522|ref| GGGSNGRTSV 10gi|20137621|sp|O95071| Hs. 140-148 EDD, the human NM_002157.1|,(SEQ ID NO: 38) AA EDD_HUMAN, Ubiquitin-- 94262 Score = 21.8 bits(44),orthologue of Homo sapiens protein ligase Expect = 105 the hyperplasticheat shock EDD (Hyperplastic discs Identities = 7/9(77%), discs tumour10 kDa protein 1 protein homolog)(hHYD) Positives = 9/9(100%) suppressor(chaperonin 10) (Progestin induced Query:   1 GGGSNGRTS 9 gene, is(HSPE1), mRNA protein)            GGGS+GR+S amplified andSbjct: 140 GGGSSGRSS 148 overexpressed in cancer 5H6 gi|40849829|gb|NSFLMTSSKPR 11 gi|12643618|sp|O60242| Hs. 694-699 BAI1 expressionAAR95625| (SEQ ID NO: 39) AA BAI3_HUMAN, Brain- 334087 Score =20.6 bits(41), inhibit stromal NADH specific angiogenesis Expect = 254vascularization dehydrogenase inhibitor 3 precursor Identities =5/6(83%), in pulmonary subunit 4 Positives = 6/6(100%) adenocarcinomaQuery:   1 NSFLMT 6            NS+LMT Sbjct: 694 NSYLMT 699 2C1gi|23958536|gb| ACSSTVSFIWI 11 gi|33112422|sp|Q16827| Hs. 623-629Functional BC036216.1|, (SEQ ID NO: 40) AA PTPO_HUMAN 160871 Score =21.8 bits(44), involvement of Homo sapiens Receptor-type protein-Expect = 128 PTP-U2L in cullin 4B, mRNA tyrosine phosphatase OIdentities = 6/7(85%), apoptosis precursor (Glomerular Positives =7/7(100%) subsequent to epithelial protein 1) Query:   3 SSTVSFI 9terminal (Protein tyrosine            SST+SFI differentiationphosphatase Sbjct: 623 SSTISFI of monoblastoid U2)(PTPase U2) (PTP-U2)629 leukemia cells 2G2 gi|25988997|gb| KKKKKKKRVGGPLQ 14gi|20532388|sp|Q9NVP1| Hs. 108-115 The expression AF541939.1|,(SEQ ID NO: 41) AA DD18_HUMAN, 363492 Score = 27.4 bits(57), of MrDb isHis-3 ATP-dependent RNA Expect = 2.8 induced upon integrationhelicase DDX18 Identities = 8/8 proliferative vector (DEAD-box protein(100%), stimulation of pJHAM007, 18)(Myc-regulated Positives = 8/8(100%)primary human complete DEAD-box protein) Query:   1 KKKKKKKR 8fibroblasts as sequence (MrDb)            KKKKKKKR well as B cellsSbjct: 108 KKKKKKKR 115 and down- regulated during terminaldifferentiation of HL60 leukemia cells 4G9 gi|17136149|ref|GPVFICSSNCFKIT 14 gi|115892|sp|P16870| Hs. 333-340 Expression of theNM_014708.2|, (SEQ ID NO: 42) AA CBPH_HUMAN, 75360 Score =24.4 bits(50), protein product Homo sapiens Carboxypeptidase H Expect =18 of the PCPH kinetochore precursor (CPH) Identities = 7/8(87%),proto-oncogene associated 1 (Carboxypeptidase E) Positives = 8/8(100%)in human (KNTC1), (CPE) (Enkephalin Query:   7 SSNCFKIT 14tumor cell lines mRNA convertase)            SSNCF+IT (ProhormoneSbjct: 333 SSNCFEIT 340 processing carboxypeptidase) 2E12gi|22062543|ref| APFTCWPTVATNTWE 15 gi|128062|sp|P08473| Hs. 167-175Loss or decrease XM_170670.1|, (SEQ ID NO: 43) AA NEP_HUMAN, 307734Score = 23.5 bits(48), in expression Homo sapiens Neprilysin (NeutralExpect = 32 of NEP putative endopeptidase) (NEP) Identities = 7/10has been transmembrane (Enkephalinase) (70%), reported in protein;(Common acute Positives = 7/10(70%), brain cancer, homolog oflymphocytic leukemia Gaps = 1/10 (10%) renal cancer yeast Golgiantigen) (CALLA) Query:   6 WPTVATNTWE 15 and membrane (Neutral           WP VAT  WE invasive protein Yif1p endopeptidase 24.11)Sbjct: 167 WP-VATENWE 175 bladder (Yip1p- (CD10) cancer. interactingfactor) (54TM), mRNA. 1B5 gi|12654862|gb| TDQSSISPGNRKAPG 15gi|6707734|sp|Q13077| Hs. 64-70 Tumor necrosis BC001275.1|(SEQ ID NO: 44) AA TRA1_HUMAN, TNF 531251 Score = 21.0 bits(42),factor receptor- BC001275, receptor associated Expect = 187 associatedHomo sapiens factor 1 (TRAF1) Identities = 6/7(85%), factor 1 geneannexin Positives = 7/7(100%) overexpression A1, mRNAQuery:  5 SISPGNR 11 in B-cell           SISPG+R chronicSbjct: 64 SISPGSR 70 lymphocytic leukemia 4B2 gi|23272851|gb|RIMGGGIQRETWISS 15 gi|20139133|sp|Q9BZF3| Hs. 906-912 Oxysterols areBC035645.1|, (SEQ ID NO: 45) AA ORP6_HUMAN, 318775 Score =21.8 bits(44), potent Homo sapiens, Oxysterol binding Expect = 104signalling Similar to protein-related Identities = 5/7(71%), lipids thatRIKEN cDNA protein 6 Positives = 6/7(85%) directly 3830613O22Query:   8 QRETWIS 14 bind liver X gene, clone            QRE W+Sreceptors Sbjct: 906 QREAWVS 912 (LXRs). Oxysterol- regulatedfunction of LXRs is to control the expression of genes involvedin reverse cholesterol transport, catabolism of cholesterol, andlipogenesis 5C6 gi|22797897|emb| ICGSWGKYNLWQSS 17gi|12644310|sp|P53618| Hs. 250-257 A major AL160171.27|, SSK AACOPB HUMAN, 3059 Score = 22.3 bits(45), component Human DNA(SEQ ID NO: 46) Coatomer beta subunit Expect = 93 of the coat ofsequence from (Beta-coat protein) Identities = 7/8(87%), non-clathrin-clone RP11- (Beta-COP) Positives = 7/8(87%) coated vesicles, 256E16 onQuery:   8 YNLWQSSS 15 beta- chromosome 1,            YNL QSSSCOP, mediate complete Sbjct: 250 YNLLQSSS 257 membrane traffic sequencethrough the Golgi complex 3C8 gi|24234687|ref| EILKPEGQHMKLRSE 18gi|2493676|sp|Q12889| Hs. 585-599 Oviduct specific NM_004134.3|, ETS AAOGP_HUMAN, 1154 Score = 24.4 bits(50), glycoproteins Homo sapiens(SEQ ID NO: 47) Oviduct-specific Expect = 21 are involved in heat shockglycoprotein precursor Identities = 10/15 variety 70 kDa protein(Oviductal (66%), of roles during 9B (mortalin-2) glycoprotein)Positives = 10/15 fertilization (HSPA9B), (66%), and nuclear gene Gaps =1/15 (6%) early embryonic encoding Query: 5 development mitochondrialPEGQHMKLRSEE- TS 18 protein, mRNA. PEGQ M LR E   TS Sbjct: 585 PEGQTMPLRGENLTS 599 1H1 gi|22024587|gb| AKARALARRSEPCS 21gi|12230848|sp|O95049| Hs. 853-862 Occludin AC103702.3|, TGKLQLR AAZO3_HUMAN, Tight 25527 Score = 23.5 bits(48), expression in Homo sapiens(SEQ ID NO: 48) junction protein ZO-3 Expect = 38 microvessels ofchromosome 17, (Zonula occludens 3 Identities = 8/10(80%),neoplastic and clone protein) Positives = 8/10(80%) non-neoplastic RP11-357H14, Query:   3 ARALARRSEP 12 human brain complete           A ALAR SEP sequence Sbjct: 853 APALARSSEP 862 2F10gi|21166212|gb| VQRGIGTI 23 gi|118206|sp|P14416| Hs. 264-270Expression of AC109584.2|, PSETIPVN AA D2DR_HUMAN, D(2) 73893 Score =22.7 bits(46), dopamine Homo sapiens RKRVNPP dopamine receptor Expect =56 receptors and chromosome 3 (SEQ ID NO: 49) Identities = 6/7(85%),transporter in clone RP11- Positives = 7/7(100%) neuroendocrine 674P14,Query:  14 PVNRKRV 20 gastrointestinal complete            PVNR+RVtumor cells sequence. Sbjct: 264 PVNRRRV 270 5C12 gi|24430032|emb|VSWFPSWARSCGRQ 28 gi|3915660|sp|Q16850| Hs. 283-292 CYP2E1 proteinAL939123.1| TPLGATYKDTLLPV AA CP51_HUMAN, 512872 Score = 24.4 bits(50),is expressed in SCO939123, (SEQ ID NO: 50) Cytochrome P450 51A1 Expect =17 both tumour Streptomyces (CYPLI) (P450L1) Identities = 8/10(80%),and normal coelicolor A3(2) (Sterol 14-alpha Positives = 8/10(80%)breast tissue complete demethylase) Query:  14 QTPLGATYKD 23  with angenome; (Lanosterol 14-alpha            QT L ATYKD increasedsegment 20/29 demethylase) (LDM) Sbjct: 283 QTLLDATYKD 292 expression in(P450-14DM) breast tumours. 2H5 gi|18606292|gb| DLQPPGRRWLPQQC 35gi|116006|sp|P08575| Hs. 40-49 Expression of BC022865.1| PGSPGRCDASVPLWAA CD45_HUMAN, 444324 Score = 23.5 bits(48), leucocyte-commonHomo sapiens SDHLPSL Leukocyte common Expect = 41 antigen andATP synthase, (SEQ ID NO: 51) antigen precursor Identities = 8/10(80%),large H+ transporting, (L-CA) Positives = 8/10(80%) sialoglycoproteinmitochondrial F1 Query: 24 SVPLWSDHLP33 on leukemic cells complex, O          SVPL SD LP in B-cell subunit Sbjct: 40 SVPLS SDPLP chronic(oligomycin 49 lymphocytic sensitivity SD LP conferring leukemia andprotein), mRNA non-Hodgkin's 2F12 gi|10443350|emb| RGLGPLAAACGRSG 36gi|8928460|sp|O75962| Hs. 2232-2244 Not associated AL133264.10|GGGGGGAGGTGSS AA TRIO_HUMAN, Triple 519209 Score = 31.2 bits(66),with cancer AL133264, NVNKKTPPN functional domain Expect = 0.22Human DNA (SEQ ID NO: 52) protein (PTPRF Identities = 11/13sequence from interacting protein) (84%), clone RP3- Positives =13/13(100%) 369A17 on Query: 13 chromosome SGGGGGGGAGGTG 25 6p22.1-22.3SGGGGGGG+GG+G Contains ESTs, Sbjct: 2232 STSs, GSSs SGGGGGGGSGGSG 2244and CpG islands 5C9 gi|15072584|emb| PMRCSCTMGEIQMQI 37gi|34395516|sp|O15085| Hs. 409-417 A novel gene at AL442003.8|,HCGARRRKAVPSSK AA ARHB_HUMAN, Rho 371602 Score = 23.1 bits(47),11q23 named LARG for Human DNA DNVQSSAH guanine nucleotide Expect = 72leukemia-associated Rho sequence from (SEQ ID NO. 53) exchange factor 11Identities = 7/9(77%), guanine nuclotide clone RP11- (PDZ-RhoGEF)Positives = 7/9(77%) exchange factor 324H6 on Query:   8 MGEIQMQIH 16(GEF) has strong chromosome 10,            M EIQ QIHsequence homology to complete Sbjct: 409 MPEIQEQIH 417several members of the sequence Rho family of GEFs. Further, LARG wasfound to be fused with MLL in a patient with primary Rho GEF,Bcr, has been implicated in leukemia through a recurrent chromosomaltranslocation. 5F9 gi|18693518|gb| WRTTYISILNLAQF 48gi|20139105|sp|Q99959| Hs. 471-492 Immunohistochemical AC015911.8|,YYSLITVLKTFNWP AA PKP2_HUMAN, 25051 Score = 47.4 bits(111),localization of Homo sapiens GTVVHACNPSTLGG Plakophilin 2 Expect = 5e-06plakophilins chromosome 17, QGRRIT Identities = 19/22 (PKP1, PKP2, PKP3,clone RP11- (SEQ ID NO: 54) (86%), and p0071) 1094M14, Positives =19/22(86%) in primary complete Query: 27 oropharyngeal sequenceWPGTVVHACNPSTLGGQ tumors GRRIT 48         WPG V HACNPSTLGGQG RITSbjct: 471 WPGAVAHACNPSTLGGQ GGRIT 492 1A3 QDSCQEN 7AA (SEQ ID NO: 76)1A4 PAYLGAHFSLPR 12 (SEQ ID NO: 77) AA 1A9 LNLYRRHFSRD 11(SEQ ID NO: 78) AA 1B12 PHTKAKIFVNANN 19 MQNTEL AA (SEQ ID NO: 79) 1B3RSGRDNGDVGAGAP 62 FRLSSTSQPRRIKP AA IAPPPRAPSPECGA GGGGPAPAGWKGSK LAAALE(SEQ ID NO: 80) 1B4b ENVLVQTN 8 AA (SEQ ID NO: 81) 2B2 SGRDNGDVGAGAPF 52RLSSTSQPRRIKPI AA APPPRAPSPECGAG GGGGGRGGGG (SEQ ID NO: 82) 1C4 TQSLTDFR8 AA (SEQ ID NO: 83) 1C8 VGKRKNGCCQSSRI 28 YGKEPLPYKLSHFP AA(SEQ ID NO: 84) 1D1 GGWRAGAGAGAGV 31 RVGPRVGEAGPEAR AA MRGG(SEQ ID NO: 85) 1D10 LTNKSLHYGMIEREN 23 NSLYINNS AA (SEQ ID NO: 86) 1D4RKRRERVGRQT 11 (SEQ ID NO: 87) AA 1D8 RSGRPRVEGEQACG 20 RTRVTS AA(SEQ ID NO: 88) 1E1 AKSWTN 6 AA (SEQ ID NO: 89) 1E12B LIQHQHLGQI 10(SEQ ID NO:91) AA 1E2 RMSPH 5 AA (SEQ ID NO: 92) 1E4T VVTHSATLTSSPP 40APSSFVCPQASRW AA LLSISELGEASSG N (SEQ ID NO: 93 1E4B RSGRDNGDVGAGAP 51FRLSSTSQPRRIKP AA IAPPPRAPSPECGA GGSLRPHSE (SEQ ID NO: 94) 1F2RSGRDNGDVGAGAP 71 FRLSSTSQPRRIKPI AA AAPSARCPPPSAGA GRRLAAGRGWKGIKLAVGFYNYFTGLCL (SEQ ID NO: 95) 1F11 LMRNLTMRLMTGMS 44 TRSSLSPRHHITCAG AATQGGTAQATTPRVPR (SEQ ID NO: 96) 1F12 RGSEIFLTAMNCS 19 HVREET AA(SEQ ID NO: 97) 1F4 AAGRGRGK 8 AA (SEQ ID NO: 98) 1F10 SGRDNGDVGAGAPF 77RLSSTSQPRRIKPI AA APPPRAPSPECGAG GGGWRPRRRRRRPR RRRRWMLMLLLMMM MVDRGNL(SEQ ID NO: 99) 1G4 SGRDNGDVGAGAPF 63 RLSSTSQPRRIKPI AA APPPRAPSPECGAGRRLAAAEEEEEDAP EEDVLEV (SEQ ID NO: 100) 1H8 ERKSCS 6 AA (SEQ ID NO: 101)1H9 ILLKTIFAYSCSE 13 (SEQ ID NO: 102) AA 2A2 GSFETSSLPSDASSL 17 CR AA(SEQ ID NO: 103) 2A5m VRLWSW 6 AA (SEQ ID NO: 104) 2A6 QEHDCGAAADGLAH 20LSDCGA AA (SEQ ID NO: 105) 2C6 LGAGGEGRRIPPP 13 (SEQ ID NO: 107) AA 2D10KRASKCKWL 9 AA (SEQ ID NO: 108) 2E2 RSGRDNGDVGAGGR 24 GASLRPHSSN AA(SEQ ID NO: 109) 2F4 CSETQAWRPLLRPAR 15 (SEQ ID NO: 110) AA 2F8SGRDNGDVGAGAPF 71 RLSSTSQPRRIKPI AA APPPRAPSPECGAG GGGGGRGGGGGGPGGGGVGGRGGGGGGR G (SEQ ID NO: 111) 2G9 QKQKKANEKKEEPK 14 (SEQ ID NO: 112)AA 2H1 LGSDERRHRAP 11 (SEQ ID NO: 113) AA 3A1 RRGRCKPSRRWHLNN 15(SEQ ID NO: 114) AA 3A10 LVCATSNF 8 AA (SEQ ID NO: 115) 3A11 FGCKSLLL8 AA (SEQ ID NO: 116) 3A12 PPSPPP 6 AA (SEQ ID NO: 117) 3A3b LNYQMKG7 AA (SEQ ID NO: 118) 3A5 VEPKREK 7 AA (SEQ ID NO:119) 3A7PKSGHAQTELTRPD 21 RLPFQVS AA (SEQ ID NO: 120) 3B2b LQDPVIRECRLRNSE 46GEATELITETFTSKS AA AISKTAESSGGPSTS R (SEQ ID NO: 121) 3B6GGRRWERGKQKTQAAE 16 (SEQ ID NO: 122) AA 3D11 LSVGPACAVSSGNETVL 31STTTPASTTLRCIS AA (SEQ ID NO: 123) 3D5T VDEEDMMNQVLQRSI 18 IDQ AA(SEQ ID NO: 124) 3D7 VQAQQRSAPARAAR 28 AGHPEAGAGMEGAG AA(SEQ ID NO: 125) 3E1 GERVSSAGGTAHGG 22 RAGLSTRR AA (SEQ ID NO: 126)3E10b EGRLQDHRRRP 11 (SEQ ID NO: 127) AA 3E7 LLFLIN 6 AA(SEQ ID NO: 128) 3F1 SKRNKPACSKWLSWYCNE 18 (SEQ ID NO: 129) AA 3F9TYKIIYVVYCQKWKKPH 40 HEETFRKPKLMNILKI AA YLSVKTKL (SEQ ID NO: 130) 3G1GKIALSSVRTQNLLSF 23 QALHKNV AA (SEQ ID NO: 131) 3G11T GLCGPDPSTGRLPR 26RFRPAASGQPWP AA (SEQ ID NO: 132) 3G12T KMQMNAYFLDKKSAK 19 MVSV AA(SEQ ID NO: 133) 3G3 SQRPPQGSQLPLPAS 27 PETATAPRKVSG AA (SEQ ID NO: 134)4B5 NKKPLGSSVEVL 12 (SEQ ID NO: 137) AA 4B6T LPQCPNIGSL 10(SEQ ID NO: 138) AA 4B7 EVYAQREDLVDEIKL 24 PKGEPLFFC AA (SEQ ID NO: 139)4C4 LNRNAI 6 AA (SEQ ID NO: 140) 4C6b PSNLINFFKVLTLLSR 18 SR AA(SEQ ID NO: 141) 4E1 LHYHGRAAPRAATRPG 16 (SEQ ID NO: 142) AA 4E8PKTMTQNSFG 10 (SEQ ID NO: 143) AA 4F10 DRQEEETSIKVLVLE 25 RSWNLHTLGP AA(SEQ ID NO: 144) 4F2_3 PLPPSPKPIKIKNYNK 17 P AA (SEQ ID NO: 146) 4F4GTATELPHRRTNKR 18 KRLG AA (SEQ ID NO: 147) 4F8 EVDVRREDLVEEIKR 24RTGQPLCIC AA (SEQ ID NO: 148) 5A1 QQPGAGLPNEP 11 (SEQ ID NO: 149) AA5H10 ENLEI 6 AA (SEQ ID NO: 151) 5H2 GRGDIPEIHTEVQQDCH 17(SEQ ID NO: 152) AA 4C9 KKRRNMLKTL 10 (SEQ ID NO: 153) AA 4D12PARPAREEEARRAV 28 SHAGVVAAAETAGP AA (SEQ ID NO: 154) 2D4 GGSSRQRDGGGAGA33 GGGGRAGGSGPQL AA PRQPAG (SEQ ID NO: 155) 4A1 APAWVTEQDSDPKKKK ...*cDNA insert sequence is the region comprising of stretch of nucleotidesfollowed by poly A tail, therefore the translated peptides has endlessnumber of lysine. Western blot can determine the mol wt of thesepeptides. (SEQ ID NO: 156) 4D7 MKRIQKKESHYLN 13 (SEQ ID NO: 157) AA 4D10AWWLMPTVPATWE 28 AEAGGSLEPRSQRLQ AA (SEQ ID NO: 158) 3G6 APRRTSEDGRAAQP33 RGAKTKATGAQAGG AA RAQAP (SEQ ID NO: 160) 2C5 RKTRYFI 7 AA(SEQ ID NO: 161) 2G3 INKRRSFYNLSNWQ 14 (SEQ ID NO: 162) AA 3G5RWLEITKYIDQ 11 (SEQ ID NO: 163) AA 4H4 KKKGGGGEGGGAGI 14(SEQ ID NO: 164) AA 4E8 GRNGKGEKGK 10 (SEQ ID NO: 165) AA 1H3RKDIKAFYYLH 11 (SEQ ID NO: 166) AA 2B10 LWSEINIKGRGEKEQ 26 QGRDTYIGLKRAA (SEQ ID NO: 167) 2C120 NWQKMTAY 8 AA (SEQ ID NO: 168) 2F6 RRMAFFRL8 AA (SEQ ID NO: 169) 3A50 DWGYIRGSRLSN 12 (SEQ ID NO: 171) AA 3B4AWWRMPVIPATWEA 28 GAGEPLEPRKRSLQ AA (SEQ ID NO: 172) 3C7 RWSRVRSWQRPQAL24 ETEETHRGRG AA (SEQ ID NO: 173) 3G4 LWHRIRNSEESKPG 31 CNEVSLQQHALLGSAA RME (SEQ ID NO: 174) 4E5 PKGRRMGFFF 10 (SEQ ID NO: 175) AA 1F120IQQKSGNGLPKTDRPG 16 (SEQ ID NO: 176) AA 1H10 LGCSTGEVPGRPCS 37RHSTSSIAAVAGPGA AA AGGGGAGG (SEQ ID NO: 177) 2G6 ASQDIRKRISQGGKG 27VNSRPTTYGCSG AA (SEQ ID NO: 178) 4C60 NRIRYPGSPRRKR 13 (SEQ ID NO: 179)AA 4F7 LPKCWDYRREPPYP 18 ADNS AA (SEQ ID NO: 180) 1F5 IPWVVVHGRS 10(SEQ ID NO: 181) AA 2E110 EIYNYQVTP 9 AA (SEQ ID NO: 183) 2F120GDVGEMLLVMRNPA 38 NRLPAARRLMGFSR AA VGFSFGIFFR (SEQ ID NO: 184) 2F9RKSESDSS 8 AA (SEQ ID NO: 185) 4H5 NSSTDSCHRKSYT 13 AA (SEQ ID NO: 186)

TABLE 6B Description of Stage II-IV clones.Size range of the Mimotopes ≧5 amino acids Peptide Size RegionDescription of sequences of of Stage the genes of Epitopes in- the Serexsimi- (II-IV) that are in-frame  frame with T7 pep- Unigene Y/N larityAntigen expression in any clones with T7 10B gene 10 B gene tide # mRNAof AA type of cancer 3H1 gi|12654010|gb| DDDSDYGSSK 103 Hs.510265 N140-243 The expressions of casein BC000805.1| KKNXKMVKKS AAkinase II (CK2) is Homo sapiens KPERKEKKMP higher in neoplasticnuclear ubiquitous KPRLKATVTP ovarian surface  casein kinase andSPVKGKGKVG epithelium. cyclin-dependent RPTASKASKECasein kinase II (CK II) kinase KTPSPKEEDE is expressed at a highersubstrate, mRNA EPESPPEKKT level in lung tumours. SISPPPEKSG DEGSEDEAPSGED (SEQ ID NO: 55) 2B3 gi|7023439|dbj| LSTSSFDEQN 10 Hs.528654 N350-360 Not associated with cancer AK001891.1|, (SEQ ID NO: 56) AAHomo sapiens cDNA FLJ11029 fis, clone PLACE1004156The above sub-table shows antigens and not mimotopes, the sub-tablebelow shows the mimotopes.

Peptide sequences of the Size Description of Mimotopes of Antigen Stagethe genes that are in- the Description of the expression in (II-IV)that are in frame with T7 pep- sequences that Unigene Region ofany type of clones Mimotope clones 10 B gene tide Mimotopes mimic #similarity of AA cancer 2B9 gi|28837315|gb|BC04 VIWLIAVISF 18 AAgi|20141211|sp|P18 Hs.123022 172-185 Stimulation of 7588.1| Homo PQNYTWL825|A2AC_HUMAN, Score = 24.8 bits (51), alpha2- sapiens KIAA1363(SEQ ID NO: Alpha-2C- Expect = 16 adrenergic protein, mRNA 57)adrenergic Identities = 10/14 (71%), inhibits receptor Positives =10/14 (71%), cholangiocarci (Alpha-2C Gaps = 3/14 (21%) noma growthadrenoceptor) Query: 2IVV-- --LI- through (Subtype C4) AVISFP 12modulation of IV LI AVISFP Raf-1 and B-Raf Sbjct: 172 activities.IVAVWLISAVISFP 185 Beta adrenergic receptor is overexpressedin pulmonary adenocarcinoma 2C12 gi|15072584|emb| PMRCSCTMGE 37 AAgi|2851534|sp| Hs.516120 4-12 Not associated AL442003.8|, HumanIQMQIHCGAR Q13724|GCS1_HUMAN Score = 24.0 bits with cancerDNA sequence from RRKAVPSSKD Mannosyl- (49), Expect = 40clone RP11-324H6 on NVQSSAH oligosaccharide Identities = 7/9 (77%),chromosome (SEQ ID NO: glucosidase Positives = 8/9 (88%) 10,complete 58)(Processing A- Query: 18 GARRRKAVP 26 sequence glucosidase I) G RRR +AVP Sbjct: 4 GERRRRAVP 12 2D7 gi|34882281|ref| LRGTSGVQPP 14 AAgi|32172435|sp| Hs.1565 514-520 RING protein XM_236768.21, RattusEIEQ (SEQ P46934|NED4_HUMAN, Score = 22.7 bits Trim32 norvegicusID NO: 59) Ubiquitin-protein (46), Expect = 70 associated withhypothetical ligase Nedd-4 Identities = 6/7 (85%), skin LOC316116Positives = 6/7 (85%) carcinogenesis (LOC316116), mRNAQuery: 8 QPPEIEQ 14 has E3- QP EIEQ ubiquitin Sbjct: 514 QPSEIEQ 520ligase properties 2D12 gi|34783327|gb| ILHLH (SEQ 5 AA gi|128616|sp|Hs.78036 218-222 NET is involved BC022049.2|,Homo ID NO: 60) P23975|Score = 17.6 bits in neuro- sapiens cDNA clone S6A2_HUMAN,(34), Expect = 2425 transmitter IMAGE:4291567, Sodium-dependentIdentities = 4/5 (80%), removal from partial cds noradrenalinePositives = 5/5 (100%) neuronal transporter Query: 1 ILHLH 5 + synapses(Norepinephrine LHLH transporter) (NET) Sbjct: 218 VLHLH 222 2E7gi|6330364|dbj| VLSALPEKNC 32 AA gi|34395825|sp| Hs.339789 167-174Protein- AB033020.1|, Homo NTVPFQPPED Q9H106|PTL2_HUMAN Score =24.4 bits tyrosine sapiens mRNA for LRYQHCSSRF Protein tyrosine(50), Expect = 21 phosphatase KIAA1194 protein LE (SEQ IDphosphatase non- Identities = 7/8 (87%), (SAP-1) is NO: 61)receptor type Positives = 8/8 (100%) overexpressed substrate 1-like 2Query: 2 LSALPEKN 9 in gastro- precursor LSALPE + N intestinalSbjct: 167 LSALPERN 174 cancer 2G10 gi|6307467|gb| WGFNERDRLS 20 AAgi|13638201|sp| Hs.274151 523-528 CD15 and CD50 BC010282.1|, HomoSILQQRCVTL P41214|LIGA_HUMAN, Score = 24.0 bits antigens aresapiens leucine-rich (SEQ ID Ligatin (49), Expect = 29 bothPPR-motif containing, NO: 62) (Hepatocellular Identities = 6/6 (100%),overexpressed mRNA carcinoma- Positives = 6/6 (100%) inassociated antigen Query: 12 ILQQRC 17 hepatocarcinom 56) ILQQRC a.Sbjct: 523 ILQQRC 528 2G11 gi|7329921|emb| VVSGFFSTFS 11 AAgi|1705762|sp|P13569| Hs.521149 429-435 Mutation of AL117379.14|L (SEQ ID CFTR_HUMAN, Score = 21.8 bits CFTR is HSJ563E14, Human DNANO: 63) Cystic fibrosis (44), Expect = 128 observed insequence from clone transmembrane Identities = 6/7 (85%),Cystic Fibrosis RP4-563E14 on conductance Positives = 6/7 (85%)chromosome 20 regulator (CFTR) Query: 5 FFSTFSL 11 Contains the 5' ofFFS FSL the DATF1 gene Sbjct: 429 FFSNFSL 435 encoding the deathassociated transcription factor 1, the 5' end of a novel gene, ESTs,STSs, GSSs and four CpG islands, complete sequence. 2H8 gi|5714635|gb|LTRPGHGQD 9 AA gi|2499758|sp|Q92729| Hs.19718 355-361 A potentialAF159295.1|AF159295, (SEQ ID NO: PTPU_HUMAN, Score = 19.3 bitsrole of PCP-2 Homo sapiens 64) Receptor-type (38), Expect = 748in cell-cell serine/threonine protein-tyrosine Identities = 6/7 (85%),recognition and protein kinase Kp78 phosphatase U Positives = 6/7 (85%)adhesion is splice variant precursor (R-PTP- Query: 1LTRPGHG 7supported by its CTAK75a mRNA U)(Protein-tyrosine LTRPG Gco-localization phosphatase J) Sbjct: 355 LTRPGDG 361 with cell(PTP-J) (Pancreatic adhesion carcinoma molecules, such phosphatase 2)as catenin and PCP-2 E-cadherin, at sites of cell- cell contact. 4C5gi|19683998|gb| LYINEMKSKK 11 AA gi|417216|sp|P33176| Hs.512922 592-599Kinesin-1 links BC025957.1| L (SEQ ID KINH_HUMAN, Score = 22.3 bitsneurofibromin Homo sapiens coated NO: 65) Kinesin heavy chain(45), Expect = 95 and merlin in a vesicle membrane (Ubiquitous kinesinIdentities = 6/8 (75%), common cellular protein, mRNA heavy chain)Positives = 8/8 (100%) pathway of (UKHC) Query: 1 LYINEMKS 8 neuro-LYI++ MKS fibromatosis Sbjct: 592 LYISKMKS 599 4H6 gi|22773353|gb|LPQCPSRGSL 10 AA gi|1352515|sp|P487 Hs.235935 37-42 AlteredAC007998.10|, (SEQ ID 45|NOV_HUMAN, Score = 20.2 bits expression ofHomo sapiens NO: 66) NOV protein (40), Expect = 414 novH is chromosomehomolog precursor Identities = 5/6 (83%), human 18, clone RP11- (NovH)Positives = 5/6 (83%) associated with 322E11, complete (NephroblastomaQuery: 2 PQCPSR 7 adrenocortical sequence overexpressed gene PQCP Rtumorigenesis protein homolog) Sbjct: 37 PQCPGR 42 5A2 gi|40788180|emb|PGWDCRLPEA 23 AA gi|21759008|sp|Q96 Hs.256126 150-159 ML-IAP, a novelAJ583821.2|, Homo ESCRFLLSSR CA5|BIR7_HUMAN, Score = 22.7 bitsinhibitor of sapiens mRNA for GED (SEQ Baculoviral IAP (46), Expect = 68apoptosis, is ubiquitin specific ID NO: 67) repeat-containingIdentities = 7/10 (70%), preferentially proteinase 40 (USP40protein 7 (Kidney Positives = 9/10 (90%) expressed in gene)inhibitor of apoptosis Query: 12 SCRFLLSSRG 21 human protein) (KIAP)SC + FLL S + G melanomas (Melanoma inhibitor Sbjct: 150 SCQFLLRSKG of159 apoptosis protein) (ML-IAP) (Livin) 5A7 gi|16508181|emb| KKMRTKM 7AAgi|130580423|sp| Hs.511876 14-19 A high AL138765.18|, Human (SEQ ID NO:Q81X29|FX16_HUMAN, Score = 20.6 bits expression DNA sequence from 68)F-box only protein (41), Expect = 310 level of F-box clone RP 11-34E5 on16 Identities = 5/6 (83%), protein, Skp2 chromosome 10, Positives =6/6 (100%) is observed in complete sequence Query: 2 KMRTKM 7diffuse large KM + TKM cell B Sbjct: 14 KMQTKM 19 lymphoma. 5B9gi|27469381|gb| QIDSSFSIPW 17 AA gi|126885|sp|P08235| Hs.331409 420-426Glucocorticoid BC042411.1|, Mus VWHGRS MCR_HUMAN, Score = 22.3 bitsand mineral- musculus, clone (SEQ ID NO: Mineralocorticoid(45), Expect = 93 ocorticoid IMAGE: 4014861, 69) receptor (MR)Identities = 6/7 (85%), cross-talk with mRNA Positives = 7/7 (100%)progesterone Query: 3 DSSFSIP 9 receptor to DSSFS + P induce focalSbjct: 420 DSSFSVP 426 adhesion and growth inhibition in breast cancercells 5B12 gi|34996477|tpg| GGRRSLRKPQ 18 AA gi|34223735|sp| Hs.414591136-142 In human Y-79 BK001418.1|, TPA: ISFFLFER Q084621CYA2_HUMAN,Score = 24.4 bits retinoblastoma Homo sapiens (SEQ ID NO:Adenylate cyclase, (50), Expect = 21 cells, metastasis 70) type II (ATPIdentities = 6/7 (85%), corticotropin- associated in lung pyrophosphate-Positives = 7/7 (100%) releasing adenocarcinoma lyase) (AdenylylQuery: 10 QISFFLF 16 hormone (CRH) transcript cyclase) Q + SFFLFstimulates 1 long isoform, Sbjct: 136 QVSFFLF 142 adenylyl cyclasetranscribed activity and non-coding RNA, increases cyclic complete AMPsequence accumulation. 5D6 gi|16741726|gb| GIRVEPPTRT 12 AAgi|6226869|sp|P349 Hs.90093 316-328 Expression of BC016660.1|, HomoIS (SEQ ID 32|HS74_HUMAN, Score = 21.0 bits HSP70 is sapiens heat shockNO: 71) HEAT SHOCK 70 (42), Expect = 229 observed in 70kDa protein 8,KDA PROTEIN 4 Identities = 6/9 (66%), human transcript variant(HEAT SHOCK 70- Positives = 8/9 (88%) hepatocellular 1, mRNA RELATEDQuery: 3 RVEPPTRTI 11 carcinoma PROTEIN APG- RVEPP R++ 2)(HSP70RY)Sbjct: 316 RVEPPLRSV 324 5E3 gi|40849693|gb| RNRYSTARER 10 AAgi|2501463|sp|Q930 Hs.77578 1356-1361 Oxidative AY495321.1|, Homo(SEQ ID 08IFAFX_HUMAN, Score =21.4 bits Modifications sapiens isolateNO: 72) Probable ubiquitin (43), Expect = 172 and Down-V1-16 mitochondrion, carboxyl-terminal Identities = 6/6 (100%),regulation of complete genome hydrolase FAF-X Positives = 6/6 (100%)Ubiquitin Query: 5 STARER 10 Carboxyl- STARER terminalSbjct: 1356 STARER 1361 Hydrolase Ll Associated with IdiopathicParkinson's and Alzheimer's Diseases. 5H8 gi|13273214|gb|AAK1 GKRHIGGTDY10 AA gi|41713338|sp| Corres- 21-25 The expression 7820|, (SEQ IDQ8N690|D119_HUMAN ponding Score = 19.3 bits of human beta-cytochrome c oxidase NO: 73) Beta-defensin 119 Unigene (38), Expect =550 defensin genes subunit I [Homo precursor (Beta- number Identities =5/5 (100%), in oral sapiens] defensin 19) (DEFB- is not Positives =5/5 (100%) squamous cell 19) found Query: 1 GKRHI 5 carcinomas(SC GKRHICs) was Sbjct: 21 GKRHI 25 demonstrated by in situ hybridization. 5A4gi|17149463|gb|AC06 VVSQLTAEMR 12 AA gi|129825|sp|P05164| Hs. 45827223-29 Myeloperoxidase 8228.8|, Homo LE (SEQ ID PERM_HUMAN, Score =22.7 bits immunoreactivity sapiens chromosome NO: 74) Myeloperoxidase(46), Expect = 71 is observed in 8, clone RP11- precursor (MPO)Identities = 6/7 (85%), adult acute 539E17, complete Positives =7/7 (100%) lymphoblastic sequence Query: 5 LTAEMRL 11 leukemia LTAEM + LSbjct: 23 LTAEMKL 29 5E7 gi|4885510|ref| RACQRSTWKT 21 AAgi|25453064|sp| Hs.514335 192-199 JNK interacting NM_005381.1|, HomoKEGNGQTESS Q9UPT6|JIP3_HUMAN, Score = 23.1 bits protein (JIP)sapiens nucleolin S (SEQ ID C-jun-amino- (47), Expect = 51can inhibit JNK (NCL), mRNA NO: 75) terminal kinase Identities =7/8 (87%), signaling interacting protein 3 Positives = 7/8 (87%)pathway in NPC (JNK-interacting Query: 13 GNGQTESS 20 cellprotein 3) (JIP-3) GN QTESS (nasopharyngeal Sbjct: 192 GNSQTESS 199carcinoma) gi|40849693|gb| RNRYSTARER 10 gi|2501463|sp| Hs.775781356-1361 Oxidative AY495321.1|, (SEQ ID NO: AA Q93008| Score =21.4 bits Modifications and Homo sapiens 72) FAFX_HUMAN, (43), Expect =172 Down-regulation isolate V1-16 Probable Identities = 6/6(100%),of Ubiquitin mitochondrion, ubiquitin Positives = 6/6(100%)Carboxyl-terminal complete genome carboxyl- Query: 5 STARER 10Hydrolase L1 terminal STARER Associated with hydrolase Sbjct: IdiopathicFAF-X 1356 STARER 1361 Parkinson's and Alzheimer's Diseases.gi|13273214|gb| GKRHIGGTDY 10 gi|41713338|sp| Corres- 21-25The expression of AAK178201, (SEQ ID AA Q8N690| ponding Score =19.3 bits(38), human beta- cytochrome c NO: 73) D119_HUMAN UnigeneExpect = 550 defensin genes in oxidase Beta-defensin number Identities =5/5(100%), oral squamous subunit I [Homo 119 precursor is notPositives = 5/5(100%) cell sapiens] (Beta- found Query: 1 GKRHI 5carcinomas(SCCs defensin 19) GKRHI ) was (DEFB-19) Sbjct: 21 GKRHI 25demonstrated by in situ hybridization. gi|7149463|gb| WSQLTAEMRL 12gi|129825|sp| Hs.45872 23-29 Myeloperoxidase AC068228.8|, E (SEQ ID AAP05164| Score = 22.7 bits immunoreactivity Homo sapiens NO: 74)PERM_HUMAN, (46), Expect =  71 is observed in chromosome MyeloperoxidaseIdentities = 6/7(85%), adult acute 8, clone precursor (MPO) Positives =7/7 (100%) lymphoblastic RP11-539E17, Query: 5 LTAEMRL 11 leukemiacomplete LTAEM + L sequence Sbjct: 23 LTAEMKL 29 gi|4885510|ref|RACQRSTWKT 21 gi|25453064|sp| Hs.514335 192-199 JNK interactingNM_005381.1|, KEGNGQTESS AA Q9UPT6| Score = 23.1 bits(47),protein (JIP) can Homo sapiens S (SEQ ID NO: JIP3_HUMAN, Expect = 51inhibit JNK nucleolin 75) C-jun- Identities = 7/8(87%), signaling(NCL), mRNA amino-terminal Positives = 7/8(87%) pathway in NPC kinaseQuery: 13 GNGQTESS 20 cell cell cell interacting GN QTESS(nasopharyngeal protein 3 Sbjct: 192 GNSQTESS carcinoma)(JNK-interacting 199 interacting protein 3) (JIP-3)

TABLE 7A

Selection of most significant clones from Group 1 dataset 26 Clonesordered according to binding with the 16 patients in Group 1. None ofthe 25 healthy women's sera (belonging to Group 1) contained IgGs thatany of these clones. Clones are shown in rows. Patients numbers areshown in the columns. The last column, TP, Total number of patientswhose serum IgGs bound to each phage clone. Δ: Serum Dilution 1:3000;all others were analyzed at a serum dilution of 1:10000.

TABLE 7B Binding of 26 clones with 16 Patients on a new dataset (Group2)

The rows represent the 26 clones and the columns represent the 16patients. As shown in this table, sera from 16 out of the 16 patients inGroup 2 contained IgGs that bound at least one clone. None of IgGs insera 12 healthy women interacted with any of these 26 clones. Δ: SerumDilution 1:3000; ▴: Serum Dilution 1:30000; all others were analyzed ata serum dilution of 1:10000.

TABLE 8 Proteins Identified as Overexpressed by IHC in OVCA ThroughLiterature Mining Gene Symbol Function Histotype Studied Source PMID 1.PARP Chromatin Modification rsc PP 17413981 2. CTSB Protease nhs PP14984956 3. CCNE Cell cycle regulator se, mu, en, cc PP 11585414 4.CLDN4 Receptor se, metastatic se, PP 15277215 5. CLU Associated withapoptosis se, non serous PP 15578711 6. CYP1B1 Mixed functionmonooxygenase se, mu, en, cc, MMMT PP 11461084 7. EIF5A2 Proteinbiosynthesis nhs PP 16424057; 15205331 8. FSCN Actin binding protein se,mu, en, cc, other PP 18498068 9. FGF8 growth and development se, mu, en,cc PP 11072239 10. HE4; WFDC2 Protease inhibitor se, mu, en, cc PP16607372 11. IGFBP5 Prolong the half life of IGFs se, mu, en, cc PP16729015 12. MAGEA4 Tumor antigen, Development se PP (SX) 14695148 13.NRG1 Signaling Protein se, en, cc PP 12473609 14. PPARG Receptor se, mu,en, cc, mixed PP 15583697 15. TAG-72 Pancarcinoma antigen se, mu, en, ccPP 17210225 16. TGFB1 Growth & differentiation se, mu PP 16835828 17.VEGF-A Angiogenesis se, mu, en PP 18343598; 16835828 18. VEGF-CAngiogenesis se, mu, en PP 18343598; 16835828 19. MEIS1 Transcriptionfactor se, mu, en, cc, other PW 17949970 20. PAX8 Transcription factorse, mu, en, cc PW 18724243 21. CKS1B cell cycle regulator se Lit16572426 22. CLDN3 Receptor se, mu, en, cc Lit 15161682 23. EDD Bindsubiquitin nhs Lit (SX) 18349819 24. FLT1 Binds to VEGF se, mu Lit16835828 25. MUC 1 Signaling se, mu, en, cc Lit 16061277; 15161682 26.MUC16 Ovarian cancer antigen CA125 Mouse model Lit 18637025 27. PRAMERepressor of retinoic acid Se Lit 18709641 receptor 28. RaIBP1 Multidrugresistance Nhs Lit 17954908 29. S100A1 Interact with hsp′s and CYP40 se,metastaic se, others Lit 15277215 30. VEGF-D Angiogenesis se, mu, en Lit18343598; 16835828 Se: Serous Mu: Mucinous En: Endometroid Cc: Clearcell PP: Plasma Proteome: http://www.plasmaproteome.org/ppihome.htm Lit:identified solely by literature mining; PW: Pathwork (Monzon FA et.alJCO, 2009, v27: 1) (SX): also identified as a tumor antigen in the SEREXdatabase; http://ludwig-sun5.unil.ch/CancerImmunomeDB/ nhs: No histotypespecified; rsc: Randomly selected cases

REFERENCES

-   1. Ali-Fehmi et al. Analysis of the expression of human tumor    antigens in ovarian cancer tissues. Cancer Biomarkers 6:33-48 2010.-   2. Alizadeh A A, et al. Distinct types of diffuse large B-cell    lymphoma identified by gene expression profiling. Nature    403:503-511, (2000).-   3. An, A, et al. A learning system for more accurate    classifications. Lecture Notes in Artificial Intelligence,    Vancouver. 1418:426-441, (1998).-   4. Aunoble B, et al. Major oncogenes and tumor suppressor genes    involved in epithelial ovarian cancer. Int J Oncol 16:567-76,    (2000).-   5. Baron A T, et al. Serum sErbB1 and Epidermal Growth Factor Levels    As Tumor Biomarkers in Women with Stage III or IV Epithelial Ovarian    Cancer Epidemiology. Biomarkers & Prevention 8:129-137, 1999.-   6. Bauer R, et al. Cloning and characterization of the Drosophila    homologue of the AP-2 transcription factor. Oncogene 17:1911-1922    (1998).-   7. Bast R C, et al. Reactivity of a monoclonal antibody with human    ovarian carcinoma. J. Clin Invest 68:1331-1337 (1981).-   8. Bast R C et al. A radioimmunoassay using a monoclonal antibody to    monitor the course of epithelial ovarian cancer. N Engl J Med 309:    883-887 (1983).-   9. Berek, J S et al. Serum interleukins-6 levels correlate with    disease status in patients with epithelial ovarian cancer. Am J    Obstet Gynecol 164: 1038-1043 (1991).-   10. Bittner, M et al. Molecular Classification of Cutaneous    Malignant Melanoma by Gene Expression Profiling. Nature 406:536-540    (2000).-   11. Blake C, et al. UCI respitory of machine learning databases    (1998).-   12. Boyd J, et al. Molecular genetic and clinical implications    [Review]. Gynecol Oncol 64:196-206 (1997).-   13. Breiman L, et al. Classification and regression trees, Wadsworth    and Brooks (1984).-   14. Buettner R, et al. An alternatively spliced form of AP-2 encodes    a negative regulator of transcriptional activation by AP-2. Mol.    Cell. Biol 13:4174-4185 (1993).-   15. Chiao P J, et al. Elevated expression of the human ribosomal S2    gene in human tumors. Molecular Carcinogenesis 5:219-231 (1992).-   16. Clark P, et al. The CN2 induction algorithm. Machine Learning    3:261-283 (1989).-   17. Coleman M P, et al. Trends in cancer incidence and mortality.    Lyon, France: IARC Scientific Publications 121:477-498 (1993).-   18. Deyo J, et al. A novel protein expressed at high cell density    but not during growth arrest. DNA and Cell Biol 17:437-447 (1998).-   19. Draghici S. The Constraint Based Decomposition, accepted for    publication in Neural Networks, to appear (2001).-   20. Einhorn, N. et al. Prospective evaluation of serum CA 125 levels    for early detection of ovarian cancer. Obstet Gynecol 80:14-18    (1992).-   21. Golub T R, et al. Molecular classification of cancer: class    discovery and class prediction by gene expression monitoring.    Science 286:531-537 (1999).-   22. Gotlieb W H, et al. Presence of interleukins in the ascites of    patients with ovarian and other intrabdominal cancers. Cytokine    4:385-390 (1992).-   23. Greenlee R T, et al. Cancer Statistics. CA Cancer J Clin 50:7-33    (2000).-   24. Heath, S. et al. Induction of oblique decision tree. In    IJCAI-93. Washington, D.C. (1993).-   25. Hogdall E V, et al. Predictive values of serum tumour markers    tetranectin, OVX1, CASA and CA125 in patients with a pelvic mass.    Int J serum tumour markers tectranectin, OVX1, CASA and CA125 in    patients with a pelvic mass. Int J Cancer 89:519-523 (2000).-   26. Holschneider C H, et al. Ovarian cancer: epidemiology, biology,    and prognostic factors. Semin Surg Oncol 1:3-10 (2000).-   27. Houts T M: Improved 2-Color Normalization For Microarray    Analyses Employing Cyanine Dyes, CAMDA (2000). Critical Assessment    of Techniques for Microarray Data Mining. Duke University Medical    Center, December 18-19 (2000).-   28. Jacobs I J, et al. Potential screening tests for ovarian cancer,    in Sharp F, Mason W P, Leake R E (eds). Ovarian Cancer. London,    Chapman and Hall Medical, 197-205 (1997).-   29. Jacobs, I. Et al. Multimodal approach to screening for ovarian    cancer. Lancet I 268-271 (1988).-   30. Jacobs I, et al. The CA 125 tumor-associated antigen: a review    of the literature. Hum Reprod 4:1-12 (1989).-   31. Kacinski B M et al. Macrophage colony-stimulating factor is    produced by human ovarian and endometrial adenocarcinoma-derived    cell lines and is present at abnormally high levels in the plasma of    ovarian carcinoma patients with active disease. Cancer Cells    7:333-337 (1989).-   32. Kerr, Martin, Churchill. Analysis of variance for gene    expression microarray data. Journal of Computational Biology (2000).-   33. Kim, S Y et al. Coordinate Control of Growth and Cytokeratin 13    Expression by Retinoic Acid. Molecular Carcinogenesis 16:6-11    (1996).-   34. Kohonen T. Learning vector quantization. Neural Networks, 1    (suppl.1):303 (1988).-   35. Kohonen T. Learning vector quantization. In the handbook of    brain theory and neural networks pp. 537-540. Cambridge Mass.: MIT    press (1995).-   36. MacBeath G. et al. Printing proteins as microarrays for    high-throughput function determination. Science 289:1760-3 (2000).-   37. Monzon et al. 2009 Multicenter validation of a 1,550-gene    expression profile for identification of tumor tissue of origin. J    Clin Oncol. 27:2503-8 (2009).-   38. Murthy K. On growing better decision trees from data.    Unpublished doctoral dissertation. John Hopkins University (1995).-   39. Musavi M. et al. On the training of radial basis functions    classifiers. Neural Networks 5:595-603 (1992).-   40. Patsner B. et al. Comparison of serum CA 125 and lipid    associated sialic acid (LASA-P) in monitoring patients with invasive    ovarian adenocarcinoma. Gynecol Oncol 30(1): 98-103 (1988).-   41. Peng Y S, et al. ARHI is the center of allelic deletion on    chromosome Ip31 in ovarian and breast cancers. Int J Cancer 86:690-4    (2000).-   42. Precup D, et al. Classification using $/Phi$-machines and    constructive function approximation. In Proc. 15th International    Conf. On Machine Learning, pages 439-444. Morgan Kaufmann, San    Francisco, Calif. (1998).-   43. Poggio T, et al. Networks for approximation and learning.    Proceedings of IEEE 78(9):1481-149 (1990).-   44. Quinlan J R: C4.5: Programs for machine learning,    Morgan-Kaufmann (1993).-   45. Rumelhart, D E, et al. Learning internal representations by    error backpropagation. Parallel Distributed Processing: Explorations    in the Microstructures of Cognition, MIT Press/Bradford Books    (1986).-   46. Schwartz P E, et al. Circulating tumor markers in the monitoring    of gynecologic malignancies. Cancer 60:353-361 (1987).-   47. Schmittgen T D et al. Quantitative reverse    transcription-polymerase chain reaction to study mRNA decay:    comparison of endpoint and real-time methods. Anal Biochem,    285:194-204 (2000).-   48. Sonoda K, Nakashima M, Kaku T, Kamura T, Nakano H, Watanabe T. A    novel tumor-associated antigen expressed in human uterine and    ovarian carcinomas. Cancer 1996 77:1501-9,-   49. Nakashima M, Sonoda K, Watanabe T. Inhibition of cell growth and    induction of apoptotic cell death by the human tumor-associated    antigen RCAS1. Nat. Med. 1999 5:938-42.-   50. Lindstrom M S, Klangby U, Wiman K G. p14ARF homozygous deletion    or MDM2 overexpression in Burkitt lymphoma lines carrying wild type    p53. Oncogene. 20(17):2171-7, 2001.

What is claimed is:
 1. A biosensor for use in detecting the presence ofdiseases, said biosensor comprising detection means for detecting apresence of at least one marker indicative of a specific disease.
 2. Thebiosensor according to claim 1, wherein the disease is a gynecologicalillness.
 3. The biosensor according to claim 1, wherein said detectionmeans is selected from the group consisting essentially of an assay, amicroarray, a macroarray, a slide, and a filter containing specificbiomarkers of disease.
 4. The biosensor according to claim 1, whereinsaid detection means is an immunoassay.
 5. A diagnostic tool fordetermining the efficacy of a pharmaceutical for treating a disease,said tool comprising: detection means for detecting a presence of atleast one marker indicative of a specific disease; and analyzing meansoperatively connected to said detection means, said analyzing means fordetermining fluctuations in amount of marker present in said detectionmeans, whereby fluctuations correlate to pharmaceutical efficacy.
 6. Adiagnostic tool for staging the progression of a disease, said toolcomprising: detection means for detecting a presence of at least onemarker indicative of disease recurrence; and analyzing means operativelyconnected to said detection means, said analyzing means for determiningfluctuations in amount of marker present in said detection means,whereby fluctuations correlate to disease stage.
 7. The biosensoraccording to claim 1 for use as detecting means for detecting theefficacy of a pharmaceutical.
 8. The biosensor according to claim 1 foruse as staging means for detecting the disease stage.
 9. A method ofdetermining efficacy of a pharmaceutical for treating a disease by:administering a pharmaceutical to a sample containing markers for adisease; detecting the amount of at least one marker of the disease inthe sample; analyzing the amount of the marker in the sample, wherebythe amount of marker correlates to response to therapy.
 10. The methodaccording to claim 9, wherein said analyzing step includes automaticallyanalyzing results of said detecting step using software.
 11. A method ofstaging a disease by: administering a therapy to a sample containingmarkers for a disease; detecting the amount of at least one marker ofthe disease in the sample; analyzing the amount of the marker in thesample, whereby the amount of marker correlates to disease stage. 12.The method according to claim 11, wherein said analyzing step includesautomatically analyzing results of said detecting step using software.13. An antigen array for use in detecting the presence of disease. 14.The array according to claim 13, wherein said array is selected from thegroup consisting essentially of a microarray, spinning disk, antigensbound to colored beads, and a macroarray.
 15. The array according toclaim 13, wherein the disease is a gynecological disease.
 16. The arrayaccording to claim 15, wherein said gynecological disease is selectedfrom the group consisting essentially of endometriosis, ovarian cancer,breast cancer, cervical cancer, and primary peritoneal carcinoma. 17.The array according to claim 13 further including markers of diseaseselected from the list in Table
 6. 18. The array according to claim 17further including markers of disease selected from Table
 8. 19.Therapeutic targets for treating disease, said targets comprising theantigens in the array set forth in claim
 13. 20. The targets accordingto claim 19, wherein said targets a personalized to the individualreceiving treatment.
 21. A method of treating disease by: detecting atleast one marker of the disease in a sample; analyzing the type of themarker in the sample, whereby the type of marker correlates to aspecific therapy.
 22. Markers for gynecological disease selected fromthe list in Table
 6. 23. The markers according to claim 22 furtherincluding markers selected from the list in Table
 8. 24. Animmuno-imaging agent comprising labeled antibodies, whereby said labeledantibodies are isolated and reactive to proteins overexpressed in vivo.25. The agent according to claim 24, wherein said protein is identifiedby the biosensor of claim
 1. 26. Informatics software for analyzing thearrays of claim 13, said software comprising analyzing means foranalyzing the arrays.
 27. The informatics software according to claim26, further including weighting means for modifying the analyzing means.28. Diagnostic or prognostic markers for molecular pathology saidmarkers being overexpressed or mutated proteins in tumor cells.